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Preface 


About This Book 

This book is part of a multivolume work entitled the AMD64 Architecture Programmer s Manual. This 
table lists each volume and its order number. 


Title 

Order No. 

Volume 1: Application Programming 

24592 

Volume 2: System Programming 

24593 

Volume 3: General-Purpose and System Instructions 

24594 

Volume 4: 128-Bit and 256-Bit Media Instructions 

26568 

Volume 5: 64-Bit Media and x87 Floating-Point Instructions 

26569 


Audience 

This volume (Volume 5) is intended for all programmers writing application or system software for a 
processor that implements the AMD64 processor architecture. 

Organization 

Volumes 3, 4, and 5 describe the AMD64 instruction set in detail. Together, they cover each 
instruction’s mnemonic syntax, opcodes, functions, affected flags, and possible exceptions. 

The AMD64 instruction set is divided into five subsets: 

• General-purpose instructions 

• System instructions 

• 128-bit and 256-bit media instructions (Streaming SIMD Extensions - SSE) 

• 64-bit media instructions (MMX™) 

• x87 floating-point instructions 

A number of instructions belong to—and are described identically in—multiple instruction subsets. 

This volume describes the 64-bit media and x87 floating-point instructions. The index at the end cross- 
references topics within this volume. For other topics relating to the AMD64 architecture, and for 
information on instructions in other subsets, see the tables of contents and indexes of the other 
volumes. 
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Conventions and Definitions 

The following section Notational Conventions describes notational conventions used in this volume 
and in the remaining volumes of this AMD64 Architecture Programmer s Manual. This is followed by 
a Definitions section which lists a number of terms used in the manual along with their technical 
definitions. Finally, the Registers section lists the registers which are a part of the application 
programming model. 

Notational Conventions 

#GP(0) 

An instruction exception—in this example, a general-protection exception with error code of 0. 
1011b 

A binary value—in this example, a 4-bit value. 

F0EA_0B02h 

A hexadecimal value. Underscore characters may be inserted to improve readability. 


128 

Numbers without an alpha suffix are decimal unless the context indicates otherwise. 


7:4 

A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Commas may be inserted 
to indicate gaps. 

CPUID Fn XXXX_XXXX_RRR[FieldName\ 

Support for optional features or the value of an implementation-specific parameter of a processor 
can be discovered by executing the CPUID instruction on that processor. To obtain this value, 
software must execute the CPUID instruction with the function code XXXX_XXXXh in EAX and 
then examine the field FieldName returned in register RRR. If the “_RRR” notation is followed by 
u _xYYY\ register ECX must be set to the value YYYh before executing CPUID. When FieldName 
is not given, the entire contents of register RRR contains the desired value. When determining 
optional feature support, if the bit identified by FieldName is set to a one, the feature is supported 
on that processor. 

CR0-CR4 

A register range, from register CRO through CR4, inclusive, with the low-order register first. 
CR0[PE], CRO.PE 

Notation for referring to a field within a register—in this case, the PE field of the CRO register. 
CR0[PE] = 1, CRO.PE = 1 

The PE field of the CRO register is set (contains the value 1). 
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EFER[LME] = 0, EFER.LME = 0 

The LME field of the EFER register is cleared (contains a value of 0). 

DS:SI 

A far pointer or logical address. The real address or segment descriptor specified by the segment 
register (DS in this example) is combined with the offset contained in the second register (SI in this 
example) to form a real or virtual address. 

RFLAGS[13:12] 

A field within a register identified by its bit range. In this example, corresponding to the IOPL 
field. 

Definitions 

Many of the following definitions assume an in-depth knowledge of the legacy x86 architecture. See 
“Related Documents” on page xxviii for descriptions of the legacy x86 architecture. 

128-bit media instructions 

Instructions that operate on the various 128-bit vector data types. Supported within both the legacy 
SSE and extended SSE instruction sets. 

256-bit media instructions 

Instructions that operate on the various 256-bit vector data types. Supported within the extended 
SSE instruction set. 

64-bit media instructions 

Instructions that operate on the 64-bit vector data types. These are primarily a combination of 
MMX and 3DNow!™ instruction sets and their extensions, with some additional instructions from 
the SSE / and SSE2 instruction sets. 

16-bit mode 

Legacy mode or compatibility mode in which a 16-bit address size is active. See legacy mode and 
compatibility mode. 

32-bit mode 

Legacy mode or compatibility mode in which a 32-bit address size is active. See legacy mode and 
compatibility mode. 

64-bit mode 

A submode of long mode. In 64-bit mode, the default address size is 64 bits and new features, such 
as register extensions, are supported for system and application software. 

absolute 

Said of a displacement that references the base of a code segment rather than an instruction pointer. 
Contrast with relative. 
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AES 

Advance Encryption Standard (AES) algorithm acceleration instructions; part of Streaming SIMD 
Extensions (SSE). 

ASID 

Address space identifier. 

AVX 

Extension of the SSE instruction set supporting 256-bit vector (packed) operands. See Streaming 
SIMD Extensions. 

biased exponent 

The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data 
type. The bias makes the range of the biased exponent always positive, which allows reciprocation 
without overflow. 

byte 

Eight bits, 
clear 

To write a bit value of 0. Compare set. 
compatibility mode 

A submode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16- 
bit and 32-bit applications run without modification. 

commit 

To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a 
register (including flags), the data cache, an internal write buffer, or memory. 

CPL 

Current privilege level, 
direct 

Referencing a memory location whose address is included in the instruction’s syntax as an 
immediate operand. The address may be an absolute or relative address. Compare indirect. 

dirty data 

Data held in the processor’s caches or internal buffers that is more recent than the copy held in 
main memory. 

displacement 

A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer 
(relative addressing). Same as offset. 
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doubleword 

Two words, or four bytes, or 32 bits, 
double quadword 

Eight words, or 16 bytes, or 128 bits. Also called octword. 
effective address size 

The address size for the current instruction after accounting for the default address size and any 
address-size override prefix. 

effective operand size 

The operand size for the current instruction after accounting for the default operand size and any 
operand-size override prefix. 

element 

See vector. 

exception 

An abnormal condition that occurs as the result of executing an instruction. The processor’s 
response to an exception depends on the type of the exception. For all exceptions except SSE 
floating-point exceptions and x87 floating-point exceptions, control is transferred to the handler 
(or service routine) for that exception, as defined by the exception’s vector. For floating-point 
exceptions defined by the IEEE 754 standard, there are both masked and unmasked responses. 
When unmasked, the exception handler is called, and when masked, a default response is provided 
instead of calling the handler. 

extended SSE 

Enhanced set of SIMD instructions supporting 256-bit vector data types and allowing the 
specification of up to four operands. A subset of the Streaming SIMD Extensions (SSE). Includes 
the A VX, FMA, FMA4, and XOP instructions. Compare legacy SSE. 

flush 

An often ambiguous tenn meaning (1) writeback, if modified, and invalidate, as in “flush the cache 
line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.” 

FMA4 

Fused Multiply Add, four operand. Part of the extended SSE instruction set. 

FMA 

Fused Multiply Add. Part of the extended SSE instruction set. 

GDT 

Global descriptor table. 
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GIF 

Global interrupt flag. 

IDT 

Interrupt descriptor table. 

IGN 

Ignored. Value written is ignored by hardware. Value returned on a read is indeterminate. See 
reserved. 

indirect 

Referencing a memory location whose address is in a register or other memory location. The 
address may be an absolute or relative address. Compare direct. 

IRB 

The virtual-8086 mode interrupt-redirection bitmap. 

1ST 

The long-mode interrupt-stack table. 

IVT 

The real-address mode interrupt-vector table. 

LDT 

Local descriptor table, 
legacy x86 

The legacy x86 architecture. See “Related Documents” on page xxviii for descriptions of the 
legacy x86 architecture. 

legacy mode 

An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and 
operating systems run without modification. A processor implementation of the AMD64 
architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real 
mode, protected mode, and virtual-8086 mode. 

legacy SSE 

A subset of the Streaming SIMD Extensions (SSE) composed of the SSE I , SSE2, SSE3, SSSE3, 
SSE4.1, SSE4.2, and SSE4A instruction sets. Compare extended SSE. 

long mode 

An operating mode unique to the AMD64 architecture. A processor implementation of the 
AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes, 
64-bit mode and compatibility mode. 
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lsb 

Least-significant bit. 

LSB 

Least-significant byte, 
main memory 

Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular 
computer system. 

mask 

(1) A control bit that prevents the occurrence of a floating-point exception from invoking an 
exception-handling routine. (2) A field of bits used for a control purpose. 

MBZ 

Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP) 
occurs. See reserved. 

memory 

Unless otherwise specified, main memory. 
msb 

Most-significant bit. 

MSB 

Most-significant byte, 
multimedia instructions 

Those instructions that operate simultaneously on multiple elements within a vector data type. 
Comprises the 256-bit media instructions, 128-bit media instructions, and 64-bit media 
instructions. 

octword 

Same as double quadword. 
offset 

Same as displacement. 
overflow 

The condition in which a floating-point number is larger in magnitude than the largest, finite, 
positive or negative number that can be represented in the data-type format being used. 

packed 

See vector. 


Preface 


XXI 



AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


PAE 

Physical-address extensions, 
physical memory 

Actual memory, consisting of main memory and cache, 
probe 

A check for an address in a processor’s caches or internal buffers. External probes originate 
outside the processor, and internal probes originate within the processor. 

protected mode 

A submode of legacy mode. 

quadword 

Four words, or eight bytes, or 64 bits. 

RAZ 

Read as zero. Value returned on a read is always zero (0) regardless of what was previously 
written. (See reserved) 

real-address mode 
See real mode. 

real mode 

A short name for real-address mode, a submode of legacy mode. 
relative 

Referencing with a displacement (also called offset) from an instruction pointer rather than the 
base of a code segment. Contrast with absolute. 

reserved 

Fields marked as reserved may be used at some future time. 

To preserve compatibility with future processors, reserved fields require special handling when 
read or written by software. Software must not depend on the state of a reserved field (unless 
qualified as RAZ), nor upon the ability of such fields to return a previously written state. 

If a field is marked reserved without qualification, software must not change the state of that field; 
it must reload that field with the same value returned from a prior read. 

Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). 

REX 

An instruction encoding prefix that specifies a 64-bit operand size and provides access to 
additional registers. 

RIP-relative addressing 

Addressing relative to the 64-bit RIP instruction pointer. 
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SBZ 

Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior. See 
reserved. 

scalar 

An atomic value existing independently of any specification of location, direction, etc., as opposed 
to vectors. 


set 

To write a bit value of 1. Compare clear. 

SIB 

A byte following an instruction opcode that specifies address calculation based on scale (S), index 
(I), and base (B). 

SIMD 

Single instruction, multiple data. See vector. 

Streaming SIMD Extensions (SSE) 

Instructions that operate on scalar or vector (packed) integer and floating point numbers. The SSE 
instruction set comprises the legacy SSE and extended SSE instruction sets. 

SSE1 

Original SSE instruction set. Includes instructions that operate on vector operands in both the 
MMX and the XMM registers. 

SSE2 

Extensions to the SSE instruction set. 

SSE3 

Further extensions to the SSE instruction set. 

SSSE3 

Further extensions to the SSE instruction set. 

SSE4.1 

Further extensions to the SSE instruction set. 

SSE4.2 

Further extensions to the SSE instruction set. 

SSE4A 

A minor extension to the SSE instruction set adding the instructions EXTRQ, INSERTQ, 
MOYNTSS, and MOVNTSD. 
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sticky bit 

A bit that is set or cleared by hardware and that remains in that state until explicitly changed by 
software. 

TOP 

The x87 top-of-stack pointer. 

TSS 

Task-state segment, 
underflow 

The condition in which a floating-point number is smaller in magnitude than the smallest nonzero, 
positive or negative number that can be represented in the data-type format being used. 

vector 

(1) A set of integer or floating-point values, called elements, that are packed into a single operand. 
Most of the media instructions support vectors as operands. Vectors are also called packed or 
SIMD (single-instruction multiple-data) operands. 

(2) An index into an interrupt descriptor table (IDT), used to access exception handlers. Compare 
exception. 

VEX 

An instruction encoding escape prefix that opens a new extended instruction encoding space, 
specifies a 64-bit operand size, and provides access to additional registers. S qqXOP prefix. 

virtual-8086 mode 

A submode of legacy mode. 

VMCB 

Virtual machine control block. 

VMM 

Virtual machine monitor, 
word 

Two bytes, or 16 bits. 

XOP instructions 

Part of the extended SSE instruction set using the XOP prefix. See Streaming SIMD Extensions. 
XOP prefix 

Extended instruction identifier prefix, used by XOP instructions allowing the specification of up to 
four operands and 128 or 256-bit operand widths. 
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Registers 

In the following list of registers, the names are used to refer either to a given register or to the contents 
of that register: 

AH-DH 

The high 8-bit AH, BH, CH, and DH registers. Compare AL-DL. 

AL-DL 

The low 8-bit AL, BL, CL, and DL registers. Compare AH-DH. 

AL-rl5B 

The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and R8B-R15B registers, available in 64-bit 
mode. 


BP 

Base pointer register. 

CR/2 

Control register number n. 


CS 

Code segment register. 
eAX-eSP 

The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers. Compare rAX—rSP. 

EBP 

Extended base pointer register. 

EFER 

Extended features enable register. 
eFLAGS 

16-bit or 32-bit flags register. Compare rFLAGS. 

EFLAGS 

32-bit (extended) flags register. 
elP 

16-bit or 32-bit instruction-pointer register. Compare rIP. 

EIP 

32-bit (extended) instruction-pointer register. 
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FLAGS 

16-bit flags register. 

GDTR 

Global descriptor table register. 

GPRs 

General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP. 
For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit 
data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8-R15. 

IDTR 

Interrupt descriptor table register. 


IP 

16-bit instruction-pointer register. 

LDTR 

Local descriptor table register. 

MSR 

Model-specific register. 
r8-rl5 

The 8-bit R8B-R15B registers, or the 16-bit R8W-R15W registers, or the 32-bit R8D-R15D 
registers, or the 64-bit R8-R15 registers. 

rAX-rSP 

The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP 
registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64- 
bit size. 

RAX 

64-bit version of the EAX register. 

RBP 

64-bit version of the EBP register. 

RBX 

64-bit version of the EBX register. 

RCX 

64-bit version of the ECX register. 
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RDI 

64-bit version of the EDI register. 

RDX 

64-bit version of the EDX register. 
rFLAGS 

16-bit, 32-bit, or 64-bit flags register. Compare RFLAGS. 

RFLAGS 

64-bit flags register. Compare rFLAGS. 
rIP 

16-bit, 32-bit, or 64-bit instruction-pointer register. Compare RIP. 

RIP 

64-bit instruction-pointer register. 

RSI 

64-bit version of the ESI register. 

RSP 

64-bit version of the ESP register. 

SP 

Stack pointer register. 

SS 

Stack segment register. 

TPR 

Task priority register (CR8), a new register introduced in the AMD64 architecture to speed 
interrupt management. 


TR 

Task register. 

Endian Order 

The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte 
values are stored with their least-significant byte at the lowest byte address, and they are illustrated 
with their least significant byte at the right side. Strings are illustrated in reverse order, because the 
addresses of their bytes increase from right to left. 
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1 64-Bit Media Instruction Reference 


This chapter describes the function, mnemonic syntax, opcodes, affected flags, and possible 
exceptions generated by the 64-bit media instructions. These instructions operate on data located in the 
64-bit MMX registers. Most of the instructions operate in parallel on sets of packed elements called 
vectors, although some operate on scalars. The instructions define both integer and floating-point 
operations, and include the legacy MMX™ instructions, the 3DNow!™ instructions, and the AMD 
extensions to the MMX and 3DNow! instruction sets. 

Each instruction that performs a vector (packed) operation is illustrated with a diagram. Figure 1-1 on 
page 1 shows the conventions used in these diagrams. The particular diagram shows the PSLLW 
(packed shift left logical words) instruction. 


Arrowheads going to a source operand 
indicate the writing of the result. In this 
case, the result is written to the first source 
operand, which is also the destination operand. 

First Source Operand 

(and Destination Operand) Second Source Operand 


mmxl 


T 

63 48 47 32 31 16 15 0 


shift left 4- 

—I shift left 4- 


mmx2/mem64 

63 48 47 32 31 16 15 0 


Operation. In this case, 
a bitwise shift-left. 


513-324.eps 

I 


Arrowheads coming from a source operand 
indicate that the source operand provides 
a control function. In this case, the second 
source operand specifies the number of bits 
to shift, and the first source operand specifies 
the data to be shifted. 


Ellipses indicate that the operation 
is repeated for each element of the 
source vectors. In this case, there are 
4 elements in each source vector, so 
the operation is performed 4 times, 
in parallel. 


File name of 
this figure (for 
documentation 
control) 


Figure 1-1. Diagram Conventions for 64-Bit Media Instructions 

Gray areas in diagrams indicate unmodified operand bits. 
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Like the 128-bit media instructions, many of the 64-bit instructions independently and simultaneously 
perform a single operation on multiple elements of a vector and are thus classified as single- 
instruction , multiple-data (SIMD) instructions. A few 64-bit media instructions convert operands in 
MMX registers to operands in GPR, XMM, or x87 registers (or vice versa), or save or restore MMX 
state, or reset x87state. 

Hardware support of the MMX instruction set and specific optional extensions can be determined by 
testing specific bits of the value returned in EDX by the CPUID instruction. If a specific bit is set in the 
return value, the feature is supported on the processor. The following lists the CPUID function 
numbers and feature bits indicating support for these features: 

• MMX Instructions, indicated by EDX[23] returned by CPUID function OOOOOOOlh and function 
8000_0001h. 

• AMD Extensions to MMX Instructions, indicated by EDX[22] of CPUID function 8000_0001h. 

• SSE1, indicated by EDX[25] of CPUID function OOOOOOOlh. 

• SSE2, indicated by EDX[26] of CPUID function OOOOOOOlh. 

• AMD 3DNow! Instructions, indicated by EDX[31] of CPUID function 8000_0001h. 

• AMD Extensions to 3DNow! Instructions, indicated by EDX[30] of CPUID function 8000_0001h. 

• FXSAVE and FXRSTOR, indicated by EDX[24] of CPUID function OOOO OOOlh and function 
8000_0001h. 

The 64-bit media instructions can be used in legacy mode or long mode. Their use in long mode is 
available if the following CPUID function return bit is set: 

• Long Mode, indicated by EDX[29] of CPUID function 8000_0001h. 

For more information on using the CPUID instruction, see the instruction description in Volume 3. 

Compilation of 64-bit media programs for execution in 64-bit mode offers four primary advantages: 
access to the eight extended, 64-bit general-purpose registers (for a register set consisting of 
GPR0-GPR15), access to the eight extended XMM registers (for a register set consisting of 
XMM0-XMM15), access to the 64-bit virtual address space, and access to the RIP-relative addressing 
mode. 

For further information, see: 

• “64-Bit Media Programming” in Volume 1. 

• “Summary of Registers and Data Types” in Volume 3. 

• “Notation” in Volume 3. 

• “Instruction Prefixes” in Volume 3. 
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CVTPD2PI Convert Packed Double-Precision Floating-Point to 

Packed Doubleword Integers 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed 32-bit signed integer values and writes the converted values in an MMX 
register. 

If the result of the conversion is an inexact value, the value is rounded as specified by the rounding 
control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of 
the conversion is larger than the maximum signed doubleword (-2 31 to +2 31 - 1), the instruction 
returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is 
masked. 

The CVTPD2PI instruction is an SSE2 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. Support for misaligned 16-byte memory accesses is indicated 
by CPUID Fn8000_000 !_ECX[MisAlignSse] = 1. 


Mnemonic Opcode 

CVTPD2PI mmx, xmm2lmem128 66 OF 2D /r 


Description 

Converts packed double-precision floating-point 
values in an XMM register or 128-bit memory location 
to packed doubleword integers values in the 
destination MMX register. 


mmx 


xmm/meml28 


63 ^ 32 31 0 127 


64 63 


convert 


convert 


cvtpd2pi.eps 


Related Instructions 

CVTDQ2PD, CVTPD2DQ, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, 
CVTTSD2SI 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE2 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE2] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support 
bit (OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was cleared to 

0 . 

See SIMD Floating-Point Exceptions, below, for 
details. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary while MXCSR.MM = 0. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled with 

MXCSR.MM = 1. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An exception is pending due to an x87 floating-point 
instruction. 

SIMD Floating-Point 
Exception, #XF 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was set to 1. 

See SIMD Floating-Point Exceptions, below, for 
details. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

SIMD Floating-Point Exceptions 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN 
value, or ±infinity. 

X 

X 

X 

A source operand was too large to fit in the 
destination format. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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CVTPI2PD Convert Packed Doubleword Integers to Packed 

Double-Precision Floating-Point 

Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to 
two double-precision floating-point values and writes the converted values in an XMM register. 

The CVTPI2PD instruction is an SSE2 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode 


CVTPI2PD xmm, mmx/mem64 66 OF 2A/r 


Description 

Converts two packed doubleword integer values in an 
MMX register or 64-bit memory location to two packed 
double-precision floating-point values in the destination 
XMM register. 


mmVmem64 


63 32 31 0 


convert 


convert 


cvtpi2pd.eps 


xmm 


127 


64 63 


Related Instructions 

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, CVTTPD2PI, 
CVTTSD2SI 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE2 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE2] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support 
bit (OSFXSR) of CR4 was cleared to 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed 
while alignment checking was enabled. 
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CVTPI2PS Convert Packed Doubleword Integers to Packed 

Single-Precision Floating-Point 

Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to 
two single-precision floating-point values and writes the converted values in the low-order 64 bits of 
an XMM register. The high-order 64 bits of the XMM register are not modified. If the result of the 
conversion is an inexact value, the value is rounded as specified by the rounding control bits (RC) in 
the MXCSR register. 

The CVTPI2PS instruction is an SSE1 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode 


CVTPI2PS xmm, mmx/mem64 OF 2A /r 


Description 

Converts packed doubleword integer values in an MMX 
register or 64-bit memory location to single-precision 
floating-point values in the destination XMM register. 


xmm 


mmx/mem64 


127 


64 63 


32 31 


63 32 31 

0 



convert 



convert 


cvtpi2ps.eps 


Related Instructions 

CVTDQ2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, 
CVTTSS2SI 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 






17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support 
bit (OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was cleared to 

0. See SIMD Floating-Point Exceptions, below, for 
details. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

SIMD Floating-Point 
Exception, #XF 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was set to 1. 

See SIMD Floating-Point Exceptions, below, for 
details. 

SIMD Floating-Point Exceptions 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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CVTPS2PI Convert Packed Single-Precision Floating-Point to 

Packed Doubleword Integers 

Converts two packed single-precision floating-point values in the low-order 64 bits of an XMM 
register or a 64-bit memory location to two packed 32-bit signed integers and writes the converted 
values in an MMX register. 

If the result of the conversion is an inexact value, the value is rounded as specified by the rounding 
control bits (RC) in the MXCSR register. If the floating-point value is a NaN, infinity, or if the result of 
the conversion is larger than the maximum signed doubleword (-2 31 to +2 31 - 1), the instruction 
returns the 32-bit indefinite integer value (8000_0000h) when the invalid-operation exception (IE) is 
masked. 

The CVTPS2PI instruction is an SSE1 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode Description 


CVTPS2PI mmx, 
xmm/mem64 


OF 2D /r 


Converts packed single-precision floating-point values in an 
XMM register or 64-bit memory location to packed 
doubleword integers in the destination MMX register. 


xmm/mem64 


64 63 32 31 0 


convert 


convert 


cvtps2pi.eps 


mmx 


63 * 32 31 


0 127 


Related Instructions 

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, CVTTPS2PI, 
CVTTSS2SI 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support 
bit (OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was cleared to 

0 . 

See SIMD Floating-Point Exceptions, below, for 
details. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment 
limit or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed 
while alignment checking was enabled. 

SIMD Floating-Point 
Exception, #XF 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was set to 1. 
See SIMD Floating-Point Exceptions, below, for 
details. 

SIMD Floating-Point Exceptions 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN 
value, or iinfinity. 

X 

X 

X 

A source operand was too large to fit in the 
destination format. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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CVTTPD2PI Convert Packed Double-Precision Floating-Point to 

Packed Doubleword Integers, Truncated 

Converts two packed double-precision floating-point values in an XMM register or a 128-bit memory 
location to two packed 32-bit signed integer values and writes the converted values in an MMX 
register. 

If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the 
floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum 
signed doubleword (-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value 
(8000_0000h) when the invalid-operation exception (IE) is masked. 

The CVTTPD2PI instruction is an SSE2 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. Support for misaligned 16-byte memory accesses is indicated 
by CPUID Fn8000_0001_ECX[MisAlignSse] = 1. 

See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


CVTTPD2PI mmx, xmm/mem128 66 OF 2C/r 


Description 

Converts packed double-precision floating-point 
values in an XMM register or 128-bit memory location 
to packed doubleword integer values in the 
destination MMX register. Inexact results are 
truncated. 


mmx 


xmm/meml28 


63 + 32 31 + 0 


127 


64 63 


convert 


convert 


cvttpd2pi.eps 


Related Instructions 

CVTDQ2PD, CVTPD2DQ, CVTPD2PI, CVTPI2PD, CVTSD2SI, CVTSI2SD, CVTTPD2DQ, 
CVTTSD2SI 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE2 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE2] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was cleared to 

0 . 

See SIMD Floating-Point Exceptions, below, for 
details. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary while MXCSR.MM = 0. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled with 

MXCSR.MM = 1. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An exception is pending due to an x87 floating-point 
instruction. 

SIMD Floating-Point 
Exception, #XF 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was set to 1. 

See SIMD Floating-Point Exceptions, below, for 
details. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

SIMD Floating-Point Exceptions 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN 
value, or iinfinity. 

X 

X 

X 

A source operand was too large to fit in the 
destination format. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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CVTTPS2PI Convert Packed Single-Precision Floating-Point to 

Packed Doubleword Integers, Truncated 

Converts two packed single-precision floating-point values in the low-order 64 bits of an XMM 
register or a 64-bit memory location to two packed 32-bit signed integer values and writes the 
converted values in an MMX register. 

If the result of the conversion is an inexact value, the value is truncated (rounded toward zero). If the 
floating-point value is a NaN, infinity, or if the result of the conversion is larger than the maximum 
signed doubleword (-2 31 to +2 31 - 1), the instruction returns the 32-bit indefinite integer value 
(8000_0000h) when the invalid-operation exception (IE) is masked. 

The CVTTPS2PI instruction is an SSE1 instruction. Support for this instruction set is indicated by 
CPUID Fn0000_0001_EDX[SSE] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode 


CVTTPS2PI mmx, xmm/mem64 OF 2C /r 


Description 

Converts packed single-precision floating-point values in 
an XMM register or 64-bit memory location to doubleword 
integer values in the destination MMX register. Inexact 
results are truncated. 


mmx 


xmm/mem64 


63 * 32 31 


127 


64 63 32 31 0 


convert 


convert 


cvttps2pi.eps 


Related Instructions 

CVTDQ2PS, CVTPI2PS, CVTPS2DQ, CVTPS2PI, CVTSI2SS, CVTSS2SI, CVTTPS2DQ, 
CVTTSS2SI 

rFLAGS Affected 

None 
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MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 











M 





M 

17 

15 

14 13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The SSE1 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was cleared to 

0 . 

See SIMD Floating-Point Exceptions, below, for 
details. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

SIMD Floating-Point 
Exception, #XF 

X 

X 

X 

There was an unmasked SIMD floating-point 
exception while CR4.0SXMMEXCPT was set to 1. 

See SIMD Floating-Point Exceptions, below, for 
details. 

SIMD Floating-Point Exceptions 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN 
value, or iinfinity. 

X 

X 

X 

A source operand was too large to fit in the 
destination format. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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EMMS Exit Multimedia State 

Clears the MMX state by setting the state of the x87 stack registers to empty (tag-bit encoding of all Is 
for all MMX registers) indicating that the contents of the registers are available for a new procedure, 
such as an x87 floating-point procedure. This setting of the tag bits is referred to as “clearing the MMX 
state”. 

Because the MMX registers and tag word are shared with the x87 floating-point instructions, software 
should execute an EMMS instruction to clear the MMX state before executing code that includes x87 
floating-point instructions. 

The functions of the EMMS and FEMMS instructions are identical. 

For details about the setting of x87 tag bits, see “Media and x87 Processor State” in Volume 2. 

The EMMS instruction is an MMX™ instruction. Support for the MMX instruction subset is indicated 
by CPUID Fn0000_0001_EDX[MMX] = 1 or CPUID Fn8000_0001_EDX[MMX] = 1. See “CPUID” 
in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

EMMS OF 77 

Related Instructions 

FEMMS (a 3DNow! instruction) 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, #NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 


Description 

Clears the MMX state. 
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FEMMS Fast Exit Multimedia State 

Clears the MMX state by setting the state of the x87 stack registers to empty (tag-bit encoding of all Is 
for all MMX registers) indicating that the contents of the registers are available for a new procedure, 
such as an x87 floating-point procedure. This setting of the tag bits is referred to as “clearing the MMX 
state”. 

Because the MMX registers and tag word are shared with the x87 floating-point instructions, software 
should execute an EMMS or FEMMS instruction to clear the MMX state before executing code that 
includes x87 floating-point instructions. 

The functions of the FEMMS and EMMS instructions are identical. The FEMMS instruction is 
supported for backward-compatibility with certain AMD processors. Software that must be both 
compatible with both AMD and non-AMD processors should use the EMMS instruction. 

FEMMS is a 3DNow! instruction. Support for this instruction subset is indicated by CPUID 
Fn8000_0001_EDX[3DNow] = 1. See “CPUID” in Volume 3 for more infonnation about the CPUID 
instruction. 

For details about the setting of x87 tag bits, see “Media and x87 Processor State” in Volume 2. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

EMMS 


Description 

Clears MMX state. 

Related Instructions 

EMMS 

rFLAGS Affected 

None 


Mnemonic Opcode 

FEMMS OF OE 


18 


FEMMS 


64-Bit Media 
Instruction Reference 



26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNow!™ instructions are not 
supported, as indicated by CPUID 
Fn8000_0001_EDX[3DNow] = 0. 

Device not available, #NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

x87 floating-point exception 
pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 
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FRSTOR Floating-Point Restore x87 and MMX™ State 

Restores the complete x87 state from memory starting at the specified address, as stored by a previous 
call to FNSAVE. The x87 state occupies 94 or 108 bytes of memory depending on whether the 
processor is operating in real or protected mode and whether the operand-size attribute is 16-bit or 32- 
bit. Because the MMX registers are mapped onto the low 64 bits of the x87 floating-point registers, 
this operation also restores the MMX state. 

If FRSTOR results in set exception flags in the loaded x87 status word register, and these exceptions 
are unmasked in the x87 control word register, a floating-point exception occurs when the next 
floating-point instruction is executed (except for the no-wait floating-point instructions). 

To avoid generating exceptions when loading a new environment, use the FCLEX or FNCLEX 
instruction to clear the exception flags in the x87 status word before storing that environment. 

For details about the memory image restored by FRSTOR, see “Media and x87 Processor State” in 
Volume 2. 


Mnemonic 

FRSTOR 

mem94/108env 


Opcode 

DD 14 


Description 

Load the x87 state from mem94/108env. 


Related Instructions 

FSAVE, FNSAVE, FXSAVE, FXRSTOR 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

M 

Loaded from memory. 

Cl 

M 

Loaded from memory. 

C2 

M 

Loaded from memory. 

C3 

M 

Loaded from memory. 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FSAVE Floating-Point Save x87 and MMX™ State 

(FNSAVE) 

Stores the complete x87 state to memory starting at the specified address and reinitializes the x87 state. 
The x87 state requires 94 or 108 bytes of memory, depending upon whether the processor is operating 
in real or protected mode and whether the operand-size attribute is 16-bit or 32-bit. Because the MMX 
registers are mapped onto the low 64 bits of the x87 floating-point registers, this operation also saves 
the MMX state. For details about the memory image saved by FNSAVE, see “Media and x87 
Processor State” in Volume 2. 

The FNSAVE instruction does not wait for pending unmasked x87 floating-point exceptions to be 
processed. Processor interrupts should be disabled before using this instruction. 

Assemblers usually provide an FSAVE macro that expands into the instruction sequence: 

WAIT ; Opcode 9B 

FNSAVE destination ; Opcode DD /6 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler, if 
necessary. The FNSAVE instruction then stores the x87 state to the specified destination. 


Mnemonic 

Opcode 

Description 

FNSAVE 

mem94/108env 


Copy the x87 state to 

DD 16 

pending floating-point 
state. 



Copy the x87 state to 

FSAVE mem94/108env 

9B DD 16 

pending floating-point 
state. 


mem94/108env without checking for 
exceptions, then reinitialize the x87 


mem94/108env after checking for 
exceptions, then reinitialize the x87 


Related Instructions 

FRSTOR, FXSAVE, FXRSTOR 


rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

0 


Cl 

0 


C2 

0 


C3 

0 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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FXRSTOR Restore XMM, MMX™, and x87 State 

Restores the XMM, MMX, and x87 state. The data loaded from memory is the state information 
previously saved using the FXSAVE instruction. Restoring data with FXRSTOR that had been 
previously saved with an FSAVE (rather than FXSAVE) instruction results in an incorrect restoration. 

If FXRSTOR results in set exception flags in the loaded x87 status word register, and these exceptions 
are unmasked in the x87 control word register, a floating-point exception occurs when the next 
floating-point instruction is executed (except for the no-wait floating-point instructions). 

If the restored MXCSR register contains a set bit in an exception status flag, and the corresponding 
exception mask bit is cleared (indicating an unmasked exception), loading the MXCSR register does 
not cause a SIMD floating-point exception (#XF). 

FXRSTOR does not restore the x87 error pointers (last instruction pointer, last data pointer, and last 
opcode), except when FXRSTOR sets FSW.ES=1 after recomputing it from the error mask bits in 
FCW and error status bits in FSW. 

The architecture supports two 512-bit memory formats for FXRSTOR, a 64-bit format that loads 
XMM0-XMM15, and a 32-bit legacy format that loads only XMM0-XMM7. If FXRSTOR is 
executed in 64-bit mode, the 64-bit format is used, otherwise the 32-bit format is used. When the 64- 
bit format is used, if the operand-size is 64-bit, FXRSTOR loads the x87 pointer registers as offset64 , 
otherwise it loads them as sel:offset32. For details about the memory fonnat used by FXRSTOR, see 
"Saving Media and x87 Processor State" in Volume 2. For details about the memory image restored by 
FXRSTOR, see “Saving Media and x87 Execution Unit State” in Volume 2. 

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXRSTOR does not restore the 
XMM registers (XMM0-XMM15) when executed in 64-bit mode at CPL 0. MXCSR is restored 
whether fast-FXSAVE/FXRSTOR is enabled or not. 

Support for the fast-FXSAVE/FXRSTOR feature is indicated by CPUID 
Fn8000_0001_EDX[FFXSR]= 1. 

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, the saved 
image of XMM0-XMM15 and MXCSR is not loaded into the processor. A general-protection 
exception occurs if the FXRSTOR instruction attempts to load non-zero values into reserved MXCSR 
bits. Software can use MXCSR_MASK to determine which bits of MXCSR are reserved. For details 
on the MXCSR_MASK, see “SSE, MMX, and x87 Programming” in Volume 2. 

Support for this instruction is implementation-specific. CPUID Fn8000_0001_EDX[FXSR] = 1 or 
CPUID Fn0000_0001_EDX[FXSR] = 1 indicates support for the FXSAVE and FXRSTOR 
instructions. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


FXRSTOR mem512env 


OF AE /I 


Restores XMM, MMX™, and x87 state from 512-byte 
memory location. 
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Related Instructions 

FWAIT, FXSAVE 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. Shaded fields are reserved. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The FXSAVE/FXRSTOR instructions are not 
supported, as indicated by EDX[FXSR] = 0, returned 
by CPUID Fn0000_0001 or CPUID Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit, 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded the data segment limit 
or was non-canonical. 



X 

A null data segment was used to reference memory. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary. 

X 

X 

X 

Ones were written to the reserved bits in MXCSR. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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FXSAVE Save XMM, MMX, and x87 State 

Saves the XMM, MMX, and x87 state. A memory location that is not aligned on a 16-byte boundary 
causes a general-protection exception. 

Unlike FSAVE and FNSAVE, FXSAVE does not alter the x87 tag bits. The contents of the saved 
MMX/x87 data registers are retained, thus indicating that the registers may be valid (or whatever other 
value the x87 tag bits indicated prior to the save). To invalidate the contents of the MMX/x87 data 
registers after FXSAVE, software must execute an FINIT instruction. Also, FXSAVE (like FNSAVE) 
does not check for pending unmasked x87 floating-point exceptions. An FWAIT instruction can be 
used for this purpose. 

FXSAVE does not save the x87 pointer registers (last instruction pointer, last data pointer, and last 
opcode), except in the relatively rare cases in which the exception-summary (ES) bit in the x87 status 
word is set to 1, indicating that an unmasked x87 exception has occurred. 

The architecture supports two 512-bit memory formats for FXSAVE, a 64-bit format that saves 
XMM0-XMM15, and a 32-bit legacy format that saves only XMM0-XMM7. If FXSAVE is executed 
in 64-bit mode, the 64-bit format is used, otherwise the 32-bit fonnat is used. When the 64-bit format is 
used, if the operand-size is 64-bit, FXSAVE saves the x87 pointer registers as offset64, otherwise it 
saves them as sel:offset32. For more details about the memory format used by FXSAVE, see “Saving 
Media and x87 Execution Unit State” in Volume 2. 

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXSAVE does not save the 
XMM registers (XMM0-XMM15) when executed in 64-bit mode at CPL 0. MXCSR is saved whether 
fast-FXSAVE/FXRSTOR is enabled or not. 

Support for the fast-FXSAVE/FXRSTOR feature is indicated by CPUID 
Fn8000_0001_EDX[FFXSR] = 1. 

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, FXSAVE 
does not save the image of XMM0-XMM15 or MXCSR. For details about the CR4.OSFXSR bit, see 
“FXSAVE/FXRSTOR Support (OSFXSR) Bit” in Volume 2. 

Support for this instruction is implementation-specific. CPUID Fn8000_0001_EDX[FXSR] = 1 or 
CPUID Fn0000_0001_EDX[FXSR] = 1 indicates support for the FXSAVE and FXRSTOR 
instructions. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


FXSAVE mem512env 


OF AE 10 


Saves XMM, MMX, and x87 state to 512-byte memory 
location. 


Related Instructions 

FINIT, FNSAVE, FRSTOR, FSAVE, FXRSTOR, LDMXCSR, STMXCSR 
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rFLAGS Affected 

None 

MXCSR Flags Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The FXSAVE/FXRSTOR instructions are not 
supported, as indicated by EDX[FXSR] = 0, returned 
by CPUID Fn0000_0001 orCPUID Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit, 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded the data segment limit 
or was non-canonical. 



X 

A null data segment was used to reference memory. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary. 



X 

The destination operand was in a non-writable 
segment. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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MASKMOVQ Masked Move Quadword 

Stores bytes from the first source operand, as selected by the second source operand, to a memory 
location specified in the DS:rDI registers (except that DS is ignored in 64-bit mode). The first source 
operand is an MMX register, and the second source operand is another MMX register. The most- 
significant bit (msb) of each byte in the second source operand specifies the store (1 = store, 0 = no 
store) of the corresponding byte of the first source operand. 

Exception and trap behavior for the elements not selected for storage to memory are implementation 
dependent. For instance, a given implementation may signal a data breakpoint or a page fault for bytes 
that are zero-masked and not actually written. 

MASKMOVQ implicitly uses weakly-ordered, write-combining buffering for the data, as described in 
“Buffering and Combining Memory Writes” in Volume 2. If the stored data is shared by multiple 
processors, this instruction should be used together with a fence instruction in order to ensure data 
coherency (refer to “Cache and TLB Management” in Volume 2). 

The MASKMOVQ instruction is an AMD extension to MMX™ instruction set and is an SSE1 
instruction. Support for AMD extensions to the MMX instruction subset is indicated by CPUID 
Fn8000_000 l_EDX[MmxExt] = 1. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

Mnemonic Opcode Description 

Store bytes from an MMX register, selected by the most- 
MASKMOVQ mmxl, mmx2 OF F7 /r significant bit of the corresponding byte in another MMX 

register, to DS:rDI. 


28 


MASKMOVQ 


64-Bit Media 
Instruction Reference 



26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


mmxl mmx2 

63 0 63 55 47 39 31 23 15 7 0 



Memory maskmovq.eps 


Related Instructions 

MASKMOVDQU 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

The SSE1 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE] = 0 and the AMD 
extensions to the MMX™ instruction set are not 
supported, as indicated by CPUID 
Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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MOVD Move Doubleword or Quadword 

Moves a 32-bit or 64-bit value in one of the following ways: 

• from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 or 64 bits 
of an XMM register, with zero-extension to 128 bits 

• from the low-order 32 or 64 bits of an XMM to a 32-bit or 64-bit general-purpose register or 
memory location 

• from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 bits (with 
zero-extension to 64 bits) or the full 64 bits of an MMX register 

• from the low-order 32 or the full 64 bits of an MMX register to a 32-bit or 64-bit general-purpose 
register or memory location. 

The MOVD instruction is a member of both the MMX and the SSE2 instruction sets. The presence of 

this instruction set is indicated by EDX[MMX] = 1 returned by CPUID FnOOOOOOOl or 

CPUID Fn8000_0001. See “CPUID” in Volume 3 for more information about the CPUID instruction. 

Mnemonic 

MOVD mmx, reglmem32 
MOVD mmx, reglmem64 
MOVD reglmem32, mmx 
MOVD reglmem64, mmx 

The following diagrams illustrate the operation of the MOVD instruction. 


Opcode 


Description 


OF 6E /r 


OF 6E /r 


OF 7E /r 


OF 7E /r 


Move 32-bit value from a general-purpose register or 
32-bit memory location to an MMX register. 

Move 64-bit value from a general-purpose register or 
64-bit memory location to an MMX register. 

Move 32-bit value from an MMX register to a 32-bit 
general-purpose register or memory location. 

Move 64-bit value from an MMX register to a 64-bit 
general-purpose register or memory location. 
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All operations 
are "copy" 


reg/mem3; 

IT^ 


reg/mem64 



mmx 

63 32 31 

0 



reg/mem3: 


reg/mem64 
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Related Instructions 

MOVDQA, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Description 

Invalid opcode, #UD 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0 returned by CPUID 
function 0000_0001 h or 8000_0001 h. 

X 

X 

X 

The SSE2 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE2] = 0. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The instruction used XMM registers while 
CR4.OSFXSR=0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

The destination operand was in a non-writable 
segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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M0VDQ2Q Move Quadword to Quadword 

Moves the low-order 64-bit value in an XMM register to a 64-bit MMX register. 

The MOVDQ2Q instruction is an SSE2 instruction. Support for this instruction subset is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode Description 


MOVDQ2Q mmx, xmm F2 OF D6 /r 


Moves low-order 64-bit value from an XMM register to the 
destination MMX register. 


mmx 


63 i 0 


xmm 

127 64 63 0 


copy 

-^ movdq2q.eps 


Related Instructions 

MOVD, MOVDQA, MOVDQU, MOVQ, MOVQ2DQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 


None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE2 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE2] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

General protection, 

#GP 

X 

X 

X 

The destination operand was in non-writable segment. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 
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MOVNTQ Move Non-Temporal Quadword 

Stores a 64-bit MMX register value into a 64-bit memory location. This instruction indicates to the 
processor that the data is non-temporal, and is unlikely to be used again soon. The processor treats the 
store as a write-combining (WC) memory write, which minimizes cache pollution. The exact method 
by which cache pollution is minimized depends on the hardware implementation of the instruction. 
For further information, see “Memory Optimization” in Volume 1. 

MOVNTQ is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an SFENCE instruction to force strong memory ordering of MOVNTQ with respect to 
other stores. 

MOVNTQ implicitly uses weakly-ordered, write-combining buffering for the data, as described in 
“Buffering and Combining Memory Writes” in Volume 2. For data that is shared by multiple 
processors, this instruction should be used together with a fence instruction in order to ensure data 
coherency (refer to “Cache and TLB Management” in Volume 2). 

The MOVNTQ instruction is a member of both the AMD MMX extensions and the SSE1 instruction 
sets. Support for the SSE1 instruction subset is indicated by CPUID Fn0000_0001_EDX[SSE] = 1. 
Support for AMD’s extensions to the MMX instruction subset is indicated by CPUID 
Fn8000_000 l_EDX[MmxExt] = 1. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


Opcode Description 


MOVNTQ mem64, mmx OF E7 /r 


Stores a 64-bit MMX register value into a 64-bit memory 
location, minimizing cache pollution. 


mem64 


mmx 


63 ▼ 0 


63 0 


copy 

_I movntq.eps 


Related Instructions 

MOVNTDQ, MOVNTI, MOVNTPD, MOVNTPS 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE] = 0 and the AMD 
extensions to the MMX™ instruction set are not 
supported, as indicated by 

CPUID Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

The destination operand was in a non-writable 
segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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MOVQ Move Quadword 

Moves a 64-bit value: 

• from an MMX register or 64-bit memory location to another MMX register, or 

• from an MMX register to another MMX register or 64-bit memory location. 

The MOVQ instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


MOVQ mmxl, mmx2/mem64 OF 6F /r 
MOVQ mmx1lmem64, mmx2 OF 7F /r 


Moves 64-bit value from an MMX register or memory 
location to an MMX register. 

Moves 64-bit value from an MMX register to an MMX 
register or memory location. 


mmx2/mem64 

63 0 


copy 


mmxl 


63 


mmxl/mem64 


mmx2 



63 0 


copy 

—I movq-64.eps 


Related Instructions 

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ2DQ 

rFLAGS Affected 

None 


38 


MOVQ 


64-Bit Media 
Instruction Reference 




26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeds the stack segment limit or 
is non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 



X 

A null data segment was used to reference memory. 



X 

The destination operand was in a non-writable 
segment. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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M0VQ2DQ Move Quadword to Quadword 

Moves a 64-bit value from an MMX register to the low-order 64 bits of an XMM register, with zero- 
extension to 128 bits. 

The MOVQ2DQ instruction is an SSE2 instruction. Support for this instruction subset is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic Opcode Description 

MOVQ2DQ xmm, mmx F3 OF D6 /r Moves 64-bit value from an MMX register to an XMM register. 


xmm 


mmx 


127 64 63 i 

’ 0 

63 0 

0 





copy 


movq2dq.eps 


Related Instructions 

MOVD, MOVDQA, MOVDQU, MOVDQ2Q, MOVQ 

rFLAGS Affected 

None 

MXCSR Flags Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 

X 

X 

X 

The SSE2 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE2] = 0. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 
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PACKSSDW Pack with Saturation Signed Doubleword to Word 

Converts each 32-bit signed integer in the first and second source operands to a 16-bit signed integer 
and packs the converted values into words in the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

Converted values from the first source operand are packed into the low-order words of the destination, 
and the converted values from the second source operand are packed into the high-order words of the 
destination. 

For each packed value in the destination, if the value is larger than the largest signed 16-bit integer, it is 
saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it is saturated to 
8000h. 

The PACKSSDW instruction is an MMX™ instruction. Support for this instruction subset is indicated 
by EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PACKSSDW mmxl, mmx2lmem64 OF 6B /r 


Description 

Packs 32-bit signed integers in an MMX register 
and another MMX register or 64-bit memory 
location into 16-bit signed integers in an MMX 
register. 


mmxl 


mmx2/mem64 



Related Instructions 

PACKSSWB, PACKUSWB 


42 


PACKSSDW 


64-Bit Media 
Instruction Reference 






26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PACKSSWB Pack with Saturation Signed Word to Byte 

Converts each 16-bit signed integer in the first and second source operands to an 8-bit signed integer 
and packs the converted values into bytes in the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

Converted values from the first source operand are packed into the low-order bytes of the destination, 
and the converted values from the second source operand are packed into the high-order bytes of the 
destination. 

For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is 
saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h. 

The PACKSSWB instruction is an MMX instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PACKSSWB mmxl, mmx2/mem64 OF 63 /r 


Description 

Packs 16-bit signed integers in an MMX register 
and another MMX register or 64-bit memory 
location into 8-bit signed integers in an MMX 
register. 


mmxl 


mmx2/mem64 



Related Instructions 

PACKSSDW, PACKUSWB 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PACKUSWB Pack with Saturation Signed Word to Unsigned 

Byte 

Converts each 16-bit signed integer in the first and second source operands to an 8-bit unsigned integer 
and packs the converted values into bytes in the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

Converted values from the first source operand are packed into the low-order bytes of the destination, 
and the converted values from the second source operand are packed into the high-order bytes of the 
destination. 

For each packed value in the destination, if the value is larger than the largest unsigned 8-bit integer, it 
is saturated to FFh, and if the value is smaller than the smallest unsigned 8-bit integer, it is saturated to 
OOh. 

The PACKUSWB instruction is an MMX™ instruction. Support for this instruction subset is indicated 
by EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PACKUSWB mmxl, mmx2/mem64 OF 67 /r 


Description 

Packs 16-bit signed integers in an MMX register 
and another MMX register or 64-bit memory 
location into 8-bit unsigned integers in an MMX 
register. 


mmxl 


mmx2/mem64 



Related Instructions 

PACKSSDW, PACKSSWB 
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rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDB Packed Add Bytes 

Adds each packed 8-bit integer value in the first source operand to the corresponding packed 8-bit 
integer in the second source operand and writes the integer result of each addition in the corresponding 
byte of the destination (first source). The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The PADDB instruction operates on both signed and unsigned integers. If the result overflows, the 
carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of 
each result are written in the destination. 

The PADDB instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOOOOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PADDB mmxl, mmx2/mem64 OF FC /r 


Description 

Adds packed byte integer values in an MMX register 
and another MMX register or 64-bit memory location 
and writes the result in the destination MMX register. 


mmxl mmx2/mem64 



paddb-64.eps 


Related Instructions 

PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDD Packed Add Doublewords 

Adds each packed 32-bit integer value in the first source operand to the corresponding packed 32-bit 
integer in the second source operand and writes the integer result of each addition in the corresponding 
doubleword of the destination (first source). The first source/destination operand is an MMX register 
and the second source operand is another MMX register or 64-bit memory location. 

The PADDD instruction operates on both signed and unsigned integers. If the result overflows, the 
carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of 
each result are written in the destination. 

The PADDD instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PADDD mmxl, mmx2/mem64 OF FE /r 


Description 

Adds packed 32-bit integer values in an MMX register 
and another MMX register or 64-bit memory location and 
writes the result in the destination MMX register. 


mmxl 


1 

63 32 31 


add-1— 

_| add 


mmx2/mem64 


63 32 31 0 


paddd-64.eps 


Related Instructions 

PADDB, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDQ Packed Add Quadwords 

Adds each packed 64-bit integer value in the first source operand to the corresponding packed 64-bit 
integer in the second source operand and writes the integer result of each addition in the corresponding 
quadword of the destination (first source). The first source/destination operand is an MMX register 
and the second source operand is another MMX register or 64-bit memory location. 

The PADDQ instruction operates on both signed and unsigned integers. If the result overflows, the 
carry is ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of 
each result are written in the destination. 

The PADDQ instruction is an SSE2 instruction. The presence of this instruction set is indicated by a 
CPUID Fn0000_0001_EDX[SSE2] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 


Opcode 

Description 

PADDQ mmxl, mmx2lmem64 

OF D4 /r 

Adds 64-bit integer value in an MMX register and 
another MMX register or 64-bit memory location and 
writes the result in the destination MMX register. 


mmxl 


mmx2/mem64 


63 i 

0 

63 0 


add -1 

^ paddq-64.eps 


Related Instructions 

PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE2 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE2] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDSB Packed Add Signed with Saturation Bytes 

Adds each packed 8-bit signed integer value in the first source operand to the corresponding packed 8- 
bit signed integer in the second source operand and writes the signed integer result of each addition in 
the corresponding byte of the destination (first source). The first source/destination operand is an 
MMX register and the second source operand is another MMX register or 64-bit memory location. 

For each packed value in the destination, if the value is larger than the largest representable signed 8- 
bit integer, it is saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is 
saturated to 8Oh. 

The PADDSB instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


Description 


PADDSB mmxl, mmx2lmem64 OF EC /r 


Adds packed byte signed integer values in an MMX 
register and another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


63 


saturate 


saturate 


mmx2/mem64 


63 





1 ac 

Id 


paddsb-64.eps 


Related Instructions 

PADDB, PADDD, PADDQ, PADDSW, PADDUSB, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDSW Packed Add Signed with Saturation Words 

Adds each packed 16-bit signed integer value in the first source operand to the corresponding packed 
16-bit signed integer in the second source operand and writes the signed integer result of each addition 
in the corresponding word of the destination (first source). The first source/destination operand is an 
MMX register and the second source operand is another MMX register or 64-bit memory location. 

For each packed value in the destination, if the value is larger than the largest representable signed 16- 
bit integer, it is saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it 
is saturated to 8000h. 

The PADDSW instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PADDSW mmxl, mmx2lmem64 OF ED /r 


Description 

Adds packed 16-bit signed integer values in an MMX 
register and another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


~i • • l 

63 48 47 32 31 16 15 0 


add — 
I 

saturate 


add 


saturate 


mmx2/mem64 

63 48 47 32 31 16 15 0 


paddsw-64.eps 


Related Instructions 

PADDB, PADDD, PADDQ, PADDSB, PADDUSB, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 


64-Bit Media 
Instruction Reference 


PADDSW 


57 






AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


PADDUSB Packed Add Unsigned with Saturation Bytes 

Adds each packed 8-bit unsigned integer value in the first source operand to the corresponding packed 
8-bit unsigned integer in the second source operand and writes the unsigned integer result of each 
addition in the corresponding byte of the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

For each packed value in the destination, if the value is larger than the largest unsigned 8-bit integer, it 
is saturated to FFh. 

The PADDUSB instruction is an MMX™ instruction. Support for this instruction subset is indicated 
by EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PADDUSB mmxl, mmx2lmem64 OF DC /r 


Description 

Adds packed byte unsigned integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the result in the 
destination MMX register. 


mmx2/mem64 


63 


paddusb-64.eps 


mmxl 


63 


add — 
I 

saturate 


add 


saturate 


J 


Related Instructions 

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSW, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDUSW Packed Add Unsigned with Saturation Words 

Adds each packed 16-bit unsigned integer value in the first source operand to the corresponding 
packed 16-bit unsigned integer in the second source operand and writes the unsigned integer result of 
each addition in the corresponding word of the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

For each packed value in the destination, if the value is larger than the largest unsigned 16-bit integer, 
it is saturated to FFFFh. 

The PADDUSW instruction is an MMX™ instruction. Support for this instruction subset is indicated 
by EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PADDUSW mmxl, mmx2lmem64 OF DD /r 


Description 

Adds packed 16-bit unsigned integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes result in the destination 
MMX register. 


mmxl 


" " l 

63 48 47 32 31 16 15 0 


mmx2/mem64 


63 48 47 32 31 16 15 0 





1 ac 

id 


saturate I 

_I saturate 


paddusw-64.eps 


Related Instructions 

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PADDW Packed Add Words 

Adds each packed 16-bit integer value in the first source operand to the corresponding packed 16-bit 
integer in the second source operand and writes the integer result of each addition in the corresponding 
word of the destination (first source). The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of the 
result are written in the destination. 

The PADDW instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PADDW mmxl, mmx2lmem64 OF FD /r 


Description 

Adds packed 16-bit integer values in an MMX register 
and another MMX register or 64-bit memory location 
and writes the result in the destination MMX register. 


mmxl 


mmx2/mem64 


63 48 47 32 31 16 15 0 


63 48 47 32 31 16 15 0 





ac 

id 


paddw-64.eps 


Related Instructions 

PADDB, PADDD, PADDQ, PADDSB, PADDSW, PADDUSB, PADDUSW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PAND Packed Logical Bitwise AND 

Performs a bitwise logical AND of the values in the first and second source operands and writes the 
result in the destination (first source). The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The PAND instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOO OOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PAND mmxl, mmx2/mem64 OF DB /r 


Description 

Performs bitwise logical AND of values in an MMX 
register and in another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


mmx2/mem64 



Related Instructions 

PANDN, POR, PXOR 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PANDN Packed Logical Bitwise AND NOT 

Performs a bitwise logical AND of the value in the second source operand and the one’s complement 
of the value in the first source operand and writes the result in the destination (first source). The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

The PANDN instruction is an MMX™ instruction. Support for this instruction subset is indicated by 
EDX[MMX] = 1, as returned by CPUID FnOOOOOOOl or CPUID Fn8000_0001. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PANDN mmxl, mmx2/mem64 OF DF /r 


Description 

Performs bitwise logical AND NOT of values in an MMX 
register and in another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


mmx2/mem64 


63 i 0 


invert 

I 

AND 


63 


0 


pandn-64.eps 


Related Instructions 

PAND, POR, PXOR 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PAVGB Packed Average Unsigned Bytes 

Computes the rounded average of each packed unsigned 8-bit integer value in the first source operand 
and the corresponding packed 8-bit unsigned integer in the second source operand and writes each 
average in the corresponding byte of the destination (first source). The average is computed by adding 
each pair of operands, adding 1 to the 9-bit temporary sum, and then right-shifting the temporary sum 
by one bit position. The destination and source operands are an MMX register and another MMX 
register or 64-bit memory location. 

The PAVGB instruction is a member of both the AMD MMX™ extensions and the SSE1 instruction 
sets. Support for the SSE1 instruction subset is indicated by CPUID Fn0000_0001_EDX[SSE] = 1. 
Support for AMD’s extensions to the MMX instruction subset is indicated by CPUID 
Fn8000_000 l_EDX[MmxExt] = 1. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


Opcode 


PAVGB mmxl, mmx2/mem64 OF EO /r 


Description 

Averages packed 8-bit unsigned integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the result in the 
destination MMX register. 


mmxl mmx2/mem64 


1.1 

63 0 63 0 


average -1- 1 

_I average- 

-1 pavgb-64.eps 


Related Instructions 

PAVGW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0; 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by 

CPUID Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PAVGUSB Packed Average Unsigned Bytes 

Computes the rounded-up average of each packed unsigned 8-bit integer value in the first source 
operand and the corresponding packed 8-bit unsigned integer in the second source operand and writes 
each average in the corresponding byte of the destination (first source). The average is computed by 
adding each pair of operands, adding 1 to the 9-bit temporary sum, and then right-shifting the 
temporary sum by one bit position. The first source/destination operand is an MMX register. The 
second source operand is another MMX register or 64-bit memory location. 

The PAVGUSB instruction performs a function identical to the 64-bit version of the PAVGB 
instruction, although the two instructions have different opcodes. PAVGUSB is a 3DNow! instruction. 
It is useful for pixel averaging in MPEG-2 motion compensation and video scaling operations. 

The PAVGUSB instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by a CPUID feature bit. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

PAVGB 


Mnemonic 


Opcode Description 


PAVGUSB mmxl, mmx2/mem64 OF OF /r BF 


Averages packed 8-bit unsigned integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the result in the destination 
MMX register. 
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mmxl 


mmx2/mem64 


63 


average 


average 


63 


pavgusb.eps 


Related Instructions 

None 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUIDFn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PAVGW Packed Average Unsigned Words 

Computes the rounded average of each packed unsigned 16-bit integer value in the first source 
operand and the corresponding packed 16-bit unsigned integer in the second source operand and writes 
each average in the corresponding word of the destination (first source). The average is computed by 
adding each pair of operands, adding 1 to the 17-bit temporary sum, and then right-shifting the 
temporary sum by one bit position. The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The PAVGW instruction is a member of both the AMD MMX™ extensions and the SSE1 instruction 
sets. Support for the SSE1 instruction subset is indicated by CPUID Fn0000_0001_EDX[SSE] = 1. 
Support for AMD’s extensions to the MMX instruction subset is indicated by CPUID 
Fn8000_000 l_EDX[MmxExt] = 1. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


Opcode 


PAVGW mmxl, mmx2/mem64 OF E3 /r 


Description 

Averages packed 16-bit unsigned integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the result in the 
destination MMX register. 


mmxl mmx2/mem64 


1 ■ ■ 1 

63 48 47 32 31 16 15 0 63 48 47 32 31 16 15 0 


average -1- 1 

_I average - 

_| pavgw-64.eps 


Related Instructions 

PAVGB 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0; 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by 

CPUID Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPEQB Packed Compare Equal Bytes 

Compares corresponding packed bytes in the first and second source operands and writes the result of 
each compare in the corresponding byte of the destination (first source). For each pair of bytes, if the 
values are equal, the result is all Is. If the values are not equal, the result is all Os. The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

The PCMPEQB instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPEQB mmxl, mmx2/mem64 OF 74 /r 


Description 

Compares packed bytes in an MMX register and an 
MMX register or 64-bit memory location. 


mmxl 


63 


1 


compare 


all Is or Os 


mmx2/mem64 


63 0 

- compare- 

all Is or Os 

_I pcmpeqb-64.eps 


Related Instructions 

PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPEQD Packed Compare Equal Doublewords 

Compares corresponding packed 32-bit values in the first and second source operands and writes the 
result of each compare in the corresponding 32 bits of the destination (first source). For each pair of 
doublewords, if the values are equal, the result is all Is. If the values are not equal, the result is all Os. 
The first source/destination operand is an MMX register and the second source operand is another 
MMX register or 64-bit memory location. 

The PCMPEQD instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPEQD mmxl, mmx2/mem64 OF 76 /r 


Description 

Compares packed doublewords in an MMX register 
and an MMX register or 64-bit memory location. 


mmxl 


63 ^ 32 31 ^ 


mmx2/mem64 


63 32 31 0 







com 







com 

pare - 



all Is or Os I 

_| all Is or Os 

_I 


pcmpeqd-64.eps 


Related Instructions 

PCMPEQB, PCMPEQW, PCMPGTB, PCMPGTD, PCMPGTW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPEQW Packed Compare Equal Words 

Compares corresponding packed 16-bit values in the first and second source operands and writes the 
result of each compare in the corresponding 16 bits of the destination (first source). For each pair of 
words, if the values are equal, the result is all Is. If the values are not equal, the result is all Os. The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

The PCMPEQW instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPEQW mmxl, mmx2/mem64 OF 75 /r 


Description 

Compares packed 16-bit values in an MMX register 
and an MMX register or 64-bit memory location. 


mmxl 


1 ■ ■ 1 

63 48 47 32 31 16 15 0 


compare 


all Is or Os 


mmx2/mem64 

63 48 47 32 31 16 15 0 


- compare- 1 

all Is or Os 

_j pcmpeqw-64.eps 


Related Instructions 

PCMPEQB, PCMPEQD, PCMPGTB, PCMPGTD, PCMPGTW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPGTB Packed Compare Greater Than Signed Bytes 

Compares corresponding packed signed bytes in the first and second source operands and writes the 
result of each compare in the corresponding byte of the destination (first source). For each pair of 
bytes, if the value in the first source operand is greater than the value in the second source operand, the 
result is all Is. If the value in the first source operand is less than or equal to the value in the second 
source operand, the result is all Os. The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The PCMPGTB instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPGTB mmxl, mmx2lmem64 OF 64 /r 


Description 

Compares packed signed bytes in an MMX register 
and an MMX register or 64-bit memory location. 


mmxl 

63 


1 


compare 


all Is or Os 


mmx2/mem64 


63 0 


- compare- 1 

all Is or Os 

_| pcmpgtb-64.eps 


Related Instructions 

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTD, PCMPGTW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPGTD Packed Compare Greater Than Signed 

Doublewords 

Compares corresponding packed signed 32-bit values in the first and second source operands and 
writes the result of each compare in the corresponding 32 bits of the destination (first source). For each 
pair of doublewords, if the value in the first source operand is greater than the value in the second 
source operand, the result is all Is. If the value in the first source operand is less than or equal to the 
value in the second source operand, the result is all Os. The first source/destination operand is an MMX 
register and the second source operand is another MMX register or 64-bit memory location. 

The PCMPGTD instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPGTD mmxl, mmx2/mem64 OF 66 /r 


Description 

Compares packed signed 32-bit values in an MMX 
register and an MMX register or 64-bit memory 
location. 


mmxl 


63 32 31 


compare 


all Is or Os 


compare 


all Is or Os 


J 


mmx2/mem64 


63 32 31 0 


pcmpgtd-64.eps 


Related Instructions 

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PCMPGTW Packed Compare Greater Than Signed Words 

Compares corresponding packed signed 16-bit values in the first and second source operands and 
writes the result of each compare in the corresponding 16 bits of the destination (first source). For each 
pair of words, if the value in the first source operand is greater than the value in the second source 
operand, the result is all Is. If the value in the first source operand is less than or equal to the value in 
the second source operand, the result is all Os. The first source/destination operand is an MMX register 
and the second source operand is another MMX register or 64-bit memory location. 

The PCMPGTW instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PCMPGTW mmxl, mmx2/mem64 OF 65 /r 


Description 

Compares packed signed 16-bit values in an MMX 
register and an MMX register or 64-bit memory 
location. 


mmxl 

63^48 47 32 31 16 15 ^ 0 


compare 


all Is or Os 


mmx2/mem64 

63 48 47 32 31 16 15 0 


- compare- 1 

all Is or Os 

I pcmpgtw-64.eps 


Related Instructions 

PCMPEQB, PCMPEQD, PCMPEQW, PCMPGTB, PCMPGTD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PEXTRW Extract Packed Word 

Extracts a 16-bit value from an MMX register, as selected by the immediate byte operand (as shown in 
Table 1-1) and writes it to the low-order word of a 32-bit general-purpose register, with zero-extension 
to 32 bits. 

The PEXTRW instruction is a member of both the AMD MMX™ extensions and the SSE1 instruction 
set. Support for the SSE1 instruction subset is indicated by CPUID Fn0000_0001_EDX[SSE] = 1. 
Support for AMD’s extensions to the MMX instruction subset is indicated by CPUID 
Fn8000_000 l_EDX[MmxExt] = 1. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic Opcode 

PEXTRW reg32, mmx, imm8 OF C5 /rib 


Description 

Extracts a 16-bit value from an MMX register and 
writes it to low-order 16 bits of a general-purpose 
register. 


reg32 


31 


15 0 


imm8 

7 0 

□ 


mmx 

63 48 47 32 31 16 15 0 


► mux 


pextrw-64.eps 


Table 1-1. Immediate-Byte Operand Encoding for 64-Bit PEXTRW 


Immediate-Byte 

Bit Field 

Value of Bit Field 

Source Bits Extracted 

1-0 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 


Related Instructions 

PINSRW 


rFLAGS Affected 
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None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0; 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by 

CPUID Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 
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PF2ID Packed Floating-Point to Integer Doubleword 

Converson 

Converts two packed single-precision floating-point values in an MMX register or a 64-bit memory 
location to two packed 32-bit signed integer values and writes the converted values in another MMX 
register. If the result of the conversion is an inexact value, the value is truncated (rounded toward 
zero). The numeric range for source and destination operands is shown in Table 1-2 on page 89. 

The PF2ID instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CVTTPS2DQ 

Description 

Converts packed single-precision floating-point values in an 
MMX register or memory location to a doubleword integer value 
in the destination MMX register. 

mmx2/mem64 


63 32 31 0 


convert 


pf2id.eps 


convert 


Mnemonic 

PF2ID mmxl, 
mmx2/mem64 

mmxl 

* 

63 32 31 0 


Opcode 

OF OF/r 
ID 


88 


PF2ID 
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Table 1-2. Numeric Range for PF2ID Results 


Source 2 

Source 1 and Destination 

0 

0 

Normal, abs(Source 2) < 1 

0 

Normal, -2 31 < Source 2 <= -1 

Round to zero (Source 2) 

Normal, 1 <= Source 2 < 2 31 

Round to zero (Source 2) 

Normal, Source 2 >= 2 31 

7FFF_FFFFh 

Normal, Source 2 <= -2 31 

8000_0000h 

Unsupported 

Undefined 


Related Instructions 

PF2IW, PI2FD, PI2FW 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PF2IW Packed Floating-Point to Integer Word Conversion 

Converts two packed single-precision floating-point values in an MMX register or a 64-bit memory 
location to two packed 16-bit signed integer values, sign-extended to 32 bits, and writes the converted 
values in another MMX register. If the result of the conversion is an inexact value, the value is 
truncated (rounded toward zero). The numeric range for source and destination operands is shown in 
Table 1-3 on page 91. Arguments outside the range representable by signed 16-bit integers are 
saturated to the largest and smallest 16-bit integer, depending on their sign. 

The PF2IW instruction is an extension to the AMD 3DNow!™ instruction set. The presence of this 
instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information 
about the CPUID instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CVTTPS2DQ 

Description 

Converts packed single-precision floating-point values in an 
MMX register or memory location to word integer values in the 
destination MMX register. 

mmx2/mem64 


63 32 31 0 

S I I S I I F 7 


convert j 

i convert 


pf2iw.eps 


Mnemonic 

PF2IW mmxl, 
mmx2lmem64 

mmxl 

* 

63 32 31 0 


Opcode 

OF OF/r 
1C 


90 
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Table 1-3. Numeric Range for PF2IW Results 


Source 2 

Source 1 and Destination 

0 

0 

Normal, abs(Source 2) < 1 

0 

Normal, -2 15 < Source 2 <= -1 

Round to zero (Source 2) 

Normal, 1 <= Source 2 < 2 15 

Round to zero (Source 2) 

Normal, Source 2 >= 2 15 

0000_7FFFh 

Normal, Source 2 <= -2 15 

FFFF_8000h 

Unsupported 

Undefined 


Related Instructions 

PF2ID, PI2FD, PI2FW 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD extensions to 3DNowl™ are not supported, 
as indicated by 

CPUID Fn8000_0001_EDX[3DNowExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFACC Packed Floating-Point Accumulate 

Adds the two single-precision floating-point values in the first source operand and adds the two single¬ 
precision values in the second source operand and writes the two results to the low-order and high- 
order doubleword, respectively, of the destination (first source). The first source/destination operand is 
an MMX register. The second source operand is another MMX register or 64-bit memory location. 

The PFACC instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

HADDPS 


Mnemonic 


Opcode 


PFACC mmxl, mmx2lmem64 


OF OF/r 
AE 


Description 

Accumulates packed single-precision floating-point values in 
an MMX register or 64-bit memory location and another MMX 
register and writes each result in the destination MMX 
register. 


mmxl mmx2/mem64 



The numeric range for operands is shown in Table 1-4 on page 93. 
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Table 1-4. Numeric Range for PFACC Results 


Source Operand 

High Operand 2 

0 

Normal 

Unsupported 

Low Operand 1 

0 

+/-0 5 

High Operand 

High Operand 

Normal 

Low Operand 

Normal, +/- 0 4 

Undefined 

Unsupported 5 

Low Operand 

Undefined 

Undefined 


Note: 

1. Least-significant floating-point value in first or second source operand. 

2. Most-significant floating-point value in first or second source operand. 

3. The sign of the result is the logical AND of the signs of the low and high operands. 

4. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is a zero 

with the sign of the operand (low or high) that is larger in magnitude. If the infinitely precise result is 
exactly zero, the result is zero with the sign of the low operand. If the absolute value of the infinitely 
precise result is greater than or equal to 2 , the result is the largest normal number with the sign of 

the low operand. 

5. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFADD, PFNACC, PFPNACC 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFADD Packed Floating-Point Add 

Adds each packed single-precision floating-point value in the first source operand to the 
corresponding packed single-precision floating-point value in the second operand and writes the result 
of each addition in the corresponding doubleword of the destination (first source). The first 
source/destination operand is an MMX register. The second source operand is another MMX register 
or 64-bit memory location. The numeric range for operands is shown in Table 1-5 on page 95. 

The PFADD instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

ADDPS 


Mnemonic 


Opcode 


PFADD mmxl, mmx2lmem64 0F /r 


Description 

Adds two packed single-precision floating-point values in an 
MMX register or 64-bit memory location and another MMX 
register and writes each result in the destination MMX 
register. 


mmxl 


63 ^ 32 31 


add-1— 

—I add 


mmx2/mem64 


63 32 31 0 


pfadd.eps 
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Table 1-5. Numeric Range for the PFADD Results 


Source Operand 

Most-Significant Doubleword 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+/- O 1 

Source 2 

Source 2 

Normal 

Source 1 

Normal, +/- 0 z 

Undefined 

Unsupported 3 

Source 1 

Undefined 

Undefined 


Note: 

1. The sign of the result is the logical AND of the signs of the source operands. 

2. If the absolute value of the infinitely precise result is less than 2T 126 (but not zero), the result is a zero 
with the sign of the source operand that is larger in magnitude. If the infinitely precise result is exactly 
zero, the result is zero with the sign of source 1. If the absolute value of the infinitely precise result is 
greater than oregual to 2 128 , the result is the largest normal number with the sign of source 1. 

3. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFACC, PFNACC, PFPNACC 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowi™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFCMPEQ Packed Floating-Point Compare Equal 

Compares each of the two packed single-precision floating-point values in the first source operand 
with the corresponding packed single-precision floating-point value in the second source operand and 
writes the result of each comparison in the corresponding doubleword of the destination (first source). 
For each pair of floating-point values, if the values are equal, the result is all Is. If the values are not 
equal, the result is all Os. The first source/destination operand is an MMX register. The second source 
operand is another MMX register or 64-bit memory location. The numeric range for operands is shown 
in Table 1-6 on page 97. 

The PFCMPEQ instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CMPSS 


Mnemonic Opcode Description 

Compares two pairs of packed single-precision floating- 
PFCMPEQ mmxl, mmx2lmem64 OF OF /r BO point values in an MMX register and an MMX register or 

64-bit memory location. 


mmxl 


63 ▼ 32 31 * 0 


- compare 

—H 

all Is or Os 


mmx2/mem64 

63 32 31 0 


- compare- 1 

1 

all Is or Os 

_I pfcmpeq.eps 
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Table 1-6. Numeric Range for the PFCMPEQ Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

FFFF_FFFFh 1 

0000_0000h 

0000_0000h 

Normal 

0000_0000h 

0000 OOOOhor 
FFFF_FFFFh 2 

0000_0000h 

Unsupported 3 

0000_0000h 

0000_0000h 

Undefined 


Note: 

1. Positive zero is equal to negative zero. 

2. The result is FFFFFFFFh if source 1 and source 2 have identical signs, exponents, and mantissas. 
Otherwise, the result is 0000_0000h. 

3. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFCMPGE, PFCMPGT 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CR0 was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPU ID Fn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CR0 was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFCMPGE Packed Floating-Point Compare Greater or Equal 

Compares each of the two packed single-precision floating-point values in the first source operand 
with the corresponding packed single-precision floating-point value in the second source operand and 
writes the result of each comparison in the corresponding doubleword of the destination (first source). 
For each pair of floating-point values, if the value in the first source operand is greater than or equal to 
the value in the second source operand, the result is all Is. If the value in the first source operand is less 
than the value in the second source operand, the result is all Os. The first source/destination operand is 
an MMX register. The second source operand is another MMX register or 64-bit memory location. 
The numeric range for operands is shown in Table 1-7 on page 99. 

The PFCMPGE instruction is a 3DNow!™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CMPPS 


Mnemonic 


Opcode Description 


PFCMPGE mmxl, 
mmx2/mem64 


OF OF /r 90 


Compares two pairs of packed single-precision floating¬ 
point values in an MMX register and an MMX register or 
64-bit memory location. 


mmxl 


63 + 32 31 


mmx2/mem64 


63 32 31 0 


- compare- 

- i -compare - 

all Is or Os ^ 

I all Is or Os 


pfcmpge.eps 


98 
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Table 1-7. Numeric Range for the PFCMPGE Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

FFFF_FFFFh 1 

0000_0000h, 

FFFF_FFFFh 2 

Undefined 

Normal 

0000_0000h, 

FFFF_FFFFh 3 

0000_0000h, 

FFFF_FFFFh 4 

Undefined 

Unsupported 5 

Undefined 

Undefined 

Undefined 


Note: 

1. Positive zero is equal to negative zero. 

2. The result is FFFFFFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h. 

3. The result is FFFF_FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h. 

4. The result is FFFFFFFFh. if source 1 is positive and source 2 is negative, or if they are both negative 

and source 1 is smaller than or equal in magnitude to source 2, or if source 1 and source 2 are both 
positive and source 1 is greater than or equal in magnitude to source 2. The result is 0000_0000h in all 
other cases. 

5. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFCMPEQ, PFCMPGT 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPU ID Fn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFCMPGT Packed Floating-Point Compare Greater Than 

Compares each of the two packed single-precision floating-point values in the first source operand 
with the corresponding packed single-precision floating-point value in the second source operand and 
writes the result of each comparison in the corresponding doubleword of the destination (first source). 
For each pair of floating-point values, if the value in the first source operand is greater than the value in 
the second source operand, the result is all Is. If the value in the first source operand is less than or 
equal to the value in the second source operand, the result is all Os. The first source/destination operand 
is an MMX register. The second source operand is another MMX register or 64-bit memory location. 
The numeric range for operands is shown in Table 1-8 on page 102. 

The PFCMPGT instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CMPPS 


Mnemonic 


Opcode Description 


PFCMPGT mmxl, 
mmx2/mem64 


OF OF/r AO 


Compares two pairs of packed single-precision floating¬ 
point values in an MMX register and an MMX register or 
64-bit memory location. 


mmxl 


63 T 32 31 


mmx2/mem64 


63 32 31 0 


- compare- 

-j-compare - 

all Is or Os i 

I all Is or Os 


pfcmpgteps 
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Table 1-8. Numeric Range for the PFCMPGT Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

0000_0000h 

0000_0000h, 

FFFF_FFFFh 1 

Undefined 

Normal 

0000_0000h, 

FFFF_FFFFh 2 

0000_0000h, 

FFFF_FFFFh 3 

Undefined 

Unsupported 4 

Undefined 

Undefined 

Undefined 


Note: 

1. The result is FFFF_FFFFh, if source 2 is negative. Otherwise, the result is 0000_0000h. 

2. The result is FFFF FFFFh, if source 1 is positive. Otherwise, the result is 0000_0000h. 

3. The result is FFFF_FFFFh, if source 1 is positive and source 2 is negative, or if they are both negative 
and source 1 is smaller in magnitude than source 2, or if source 1 and source 2 are positive and source 
1 is greater in magnitude than source 2. The result is 0000_0000h in all other cases. 

4. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFCMPEQ, PFCMPGE 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by ECPUID Fn8000 0001 EDX[3DNow] 

= 0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFMAX Packed Single-Precision Floating-Point Maximum 

Compares each of the two packed single-precision floating-point values in the first source operand 
with the corresponding packed single-precision floating-point value in the second source operand and 
writes the maximum of the two values for each comparison in the corresponding doubleword of the 
destination (first source). The first source/destination operand is an MMX register. The second source 
operand is another MMX register or 64-bit memory location. 

Any operation with a zero and a negative number returns positive zero. An operation consisting of two 
zeros returns positive zero. If either source operand is an undefined value, the result is undefined. The 
numeric range for source and destination operands is shown in Table 1-9 on page 104. 

The PFMAX instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

MAXPS 


Mnemonic 


Opcode 


PFMAX mmxl, mmx2lmem64 


OF OF/r 
A4 


Description 

Compares two pairs of packed single-precision values in an 
MMX register and another MMX register or 64-bit memory 
location and writes the maximum value of each comparison 
in the destination MMX register. 


mmxl 


63 T 32 31 


maximum 


maximum 


mmx2/mem64 


63 32 31 0 


pfmax.eps 
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Table 1-9. Numeric Range for the PFMAX Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+0 

Source 2, +0 1 

Undefined 

Normal 

Source 1, +CF 

Source 1/Source 2l 6 

Undefined 

Unsupported 4 

Undefined 

Undefined 

Undefined 


Note: 

1. The result is source 2, if source 2 is positive. Otherwise, the result is positive zero. 

2. The result is source 1, if source 1 is positive. Otherwise, the result is positive zero. 

3. The result is source 1, if source 1 is positive and source 2 is negative. The result is source 1, if both are 

positive and source 1 is greater in magnitude than source 2. The result is source 1, if both are negative 

and source 1 is lesser in magnitude than source 2. The result is source 2 in all other cases. 

4. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFMIN 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPU ID Fn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFMIN Packed Single-Precision Floating-Point Minimum 

Compares each of the two packed single-precision floating-point values in the first source operand 
with the corresponding packed single-precision floating-point value in the second source operand and 
writes the minimum of the two values for each comparison in the corresponding doubleword of the 
destination (first source). The first source/destination operand is an MMX register. The second source 
operand is another MMX register or 64-bit memory location. 

Any operation with a zero and a positive number returns positive zero. An operation consisting of two 
zeros returns positive zero. If either source operand is an undefined value, the result is undefined. The 
numeric range for source and destination operands is shown in Table 1-10 on page 106. 

The PFMIN instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

MINPS 


Mnemonic 


Opcode Description 


PFMIN mmxl, mmx2/mem64 


Compares two pairs of packed single-precision values in an 
OF OF /r MMX register and another MMX register or 64-bit memory 

94 location and writes the minimum value of each comparison in 

the destination MMX register. 


mmxl 


63 


32 31 


mmx2/mem64 


63 


32 31 







minir 







minii 

Tium- 



pfmin.eps 
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Table 1-10. Numeric Range for the PFMIN Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+0 

Source 2, +0 1 

Undefined 

Normal 

Source 1, +CF 

Source 1/Source 2l 6 

Undefined 

Unsupported 4 

Undefined 

Undefined 

Undefined 


Note: 

1. The result is source 2, if source 2 is negative. Otherwise, the result is positive zero. 

2. The result is source 1, if source 1 is negative. Otherwise, the result is positive zero. 

3. The result is source 1, if source 1 is negative and source 2 is positive. The result is source 1, if both are 
negative and source 1 is greater in magnitude than source 2. The result is source 1, if both are positive 
and source 1 is lesser in magnitude than source 2. The result is source 2 in all other cases. 

4. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFMAX 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPU ID Fn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 


106 


PFMIN 


64-Bit Media 
Instruction Reference 







26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


PFMUL Packed Floating-Point Multiply 

Multiplies each of the two packed single-precision floating-point values in the first source operand by 
the corresponding packed single-precision floating-point value in the second source operand and 
writes the result of each multiplication in the corresponding doubleword of the destination (first 
source). The numeric range for source and destination operands is shown in Table 1-11 on page 108. 
The first source/destination operand is an MMX register. The second source operand is another MMX 
register or 64-bit memory location. 

The PFMUL instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

MULPS 


Mnemonic 


Opcode 


PFMUL mmxl, mmx2lmem64 


OF OF/r 
B4 


Description 

Multiplies packed single-precision floating-point values in an 
MMX register and another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


^ X? XI ▼ 


63 ▼ 32 31 


multiply 


multiply 


mmx2/mem64 


63 32 31 0 


pfmul.eps 
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Table 1-11. Numeric Range for the PFMUL Instruction 


Operand 

Value 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+/-0 1 

+/-0 1 

+/-0 1 

Normal 

+/-0 1 

Normal, +/- 0 2 

Undefined 

Unsupported 3 

+/-0 1 

Undefined 

Undefined 


Note: 

1. The sign of the result is the exclusive-OR of the signs of the source operands. 

2. If the absolute value of the result is less than 2~ 126 , the result is zero with the sign being the exclusive- 
OR of the signs of the source operands. If the absolute value of the product is greater than or equal to 
2 128 , the result is the largest normal number with the sign being the exclusive-OR of the signs of the 
source operands. 

3. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

None 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowi™ instructions are not supported, 
as indicated by CPUIDFn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFNACC Packed Floating-Point Negative Accumulate 

Subtracts the first source operand’s high-order single-precision floating-point value from its low-order 
single-precision floating-point value, subtracts the second source operand’s high-order single¬ 
precision floating-point value from its low-order single-precision floating-point value, and writes each 
result to the low-order or high-order doubleword, respectively, of the destination (first source). The 
first source/destination operand is an MMX register. The second source operand is another MMX 
register or 64-bit memory location. 

The numeric range for operands is shown in Table 1-12 on page 110. 

The PFNACC instruction is an extension to the AMD 3DNow!™ instruction set. The presence of this 
instruction set is indicated by CPUID Fn8000_0001_EDX[3DNowExt] =1. See “CPUID” in Volume 3 
for more information about the CPUID instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

HSUBPS 


Mnemonic 


Opcode 


PFNACC mmxl, mmx2/mem64 


OF OF/r 
8A 


Description 

Subtracts the packed single-precision floating-point values 
in an MMX register or 64-bit memory location and another 
MMX register and writes each value in the destination MMX 
register. 


mmxl 


1 1 

63 32 31 0 


subtract 


mmx2/mem64 


63 32 31 0 


subtract 


pfnacc.eps 
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Table 1-12. Numeric Range of PFNACC Results 


Source Operand 

High Operand 2 

0 

Normal 

Unsupported 

Low Operand 1 

0 

+/-0 5 

- High Operand 

- High Operand 

Normal 

Low Operand 

Normal, +/- 0 4 

Undefined 

Unsupported 5 

Low Operand 

Undefined 

Undefined 


Note: 


1. Least-significant floating-point value in first or second source operand. 

2. Most-significant floating-point value in first or second source operand. 

3. The sign is the logical AND of the sign of the low operand and the inverse of the sign of the high operand. 

4. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is a zero. 

If the low operand is larger in magnitude than the high operand, the sign of this zero is the same as the 
sign of the low operand, else it is the inverse of the sign of the high operand. If the infinitely precise result 
is exactly zero, the result is zero with the sign of the low operand. If the absolute value of the infinitely 
precise result is greater than or equal to 2 , the result is the largest normal number with the sign of 

the low operand. 

5. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFSUB, PFACC, PFPNACC 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD extensions to 3DNowl™ are not supported, 
as indicated by 

CPUID Fn8000_0001_EDX[3DNowExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFPNACC Packed Floating-Point Positive-Negative 

Accumulate 

Subtracts the first source operand’s high-order single-precision floating-point value from its low-order 
single-precision floating-point value, adds the two single-precision values in the second source 
operand, and writes each result to the low-order or high-order doubleword, respectively, of the 
destination (first source). The first source/destination operand is an MMX register. The second source 
operand is another MMX register or 64-bit memory location. 

The numeric range for operands is shown in Table 1-13 (for the low result) and Table 1-14 (for the 
high result), both on page 113. 

The PFPNACC instruction is an extension to the AMD 3DNow!™ instruction set. The presence of this 
instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information 
about the CPUID instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended instruction Substitution 

ADDSUBPS 


Mnemonic 


PFPNACC mmxl, 
mmx2/mem64 


Opcode Description 

Subtracts the packed single-precision floating-point values 
. in an MMX register, adds the packed single-precision 
oi= floating-point values in another MMX register or 64-bit 

memory location, and writes each value in the destination 
MMX register. 


mmxl 


1 1 

63 32 31 0 


subtract 


mmx2/mem64 


63 32 31 0 


add 


pfpnacc.eps 
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Table 1-13. Numeric Range of PFPNACC Result (Low Result) 


Source Operand 

High Operand 2 

0 

Normal 

Unsupported 

Low Operand 1 

0 

+/-0 5 

- High Operand 

- High Operand 

Normal 

Low Operand 

Normal, +/- 0 4 

Undefined 

Unsupported 5 

Low Operand 

Undefined 

Undefined 


Note: 


1. Least-significant floating-point value in first or second source operand. 

2. Most-significant floating-point value in first or second source operand. 

3. The sign is the logical AND of the sign of the low operand and the inverse of the sign of the high operand. 

4. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is a zero. 

If the low operand is larger in magnitude than the high operand, the sign of this zero is the same as the 
sign of the low operand, else it is the inverse of the sign of the high operand. If the infinitely precise result 
is exactly zero, the result is zero with the sign of the low operand. If the absolute value of the infinitely 
precise result is greater than or equal to 2 , the result is the largest normal number with the sign of 

the low operand. 

5. “Unsupported” means that the exponent is all ones (Is). 


Table 1-14. Numeric Range of PFPNACC Result (High Result) 


Source Operand 

High Operand 2 

0 

Normal 

Unsupported 

Low Operand 1 

0 

+/-0 5 

High Operand 

High Operand 

Normal 

Low Operand 

Normal, +/- 0 4 

Undefined 

Unsupported 5 

Low Operand 

Undefined 

Undefined 


Note: 

1. Least-significant floating-point value in first or second source operand. 

2. Most-significant floating-point value in first or second source operand. 

3. The sign is the logical AND of the signs of the low and high operands. 

4. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is zero with 
the sign of the operand (low or high) that is larger in magnitude. If the infinitely precise result is exactly 
zero, the result is zero with the sign of the low operand. If the absolute value of the infinitely precise 
result is greater than or equal to 2 128 , the result is the largest normal number with the sign of the low 
operand. 

5. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFADD, PFSUB, PFACC, PFNACC 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD extensions to 3DNowl™ are not supported, 
as indicated by 

CPUID Fn8000_0001_EDX[3DNowExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFRCP Floating-Point Reciprocal Approximation 

Computes the approximate reciprocal of the single-precision floating-point value in the low-order 32 
bits of an MMX register or 64-bit memory location and writes the result in both doublewords of 
another MMX register. The result is accurate to 14 bits. 

The PFRCP result can be forwarded to the Newton-Raphson iteration step 1 (PFRCPIT1) and Newton- 
Raphson iteration step 2 (PFRCPIT2) instructions to increase the accuracy of the reciprocal. The first 
stage of this refinement in accuracy (PFRCPIT1) requires that the input and output of the previously 
executed PFRCP instruction be used as input to the PFRCPIT1 instruction. 

The estimate contains the correct round-to-nearest value for approximately 99% of all arguments. The 
remaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-the- 
last-place (ulp). For details, see the data sheet or other software-optimization documentation relating 
to particular hardware implementations. 

PFRCP(x) returns 0 for x >= 2’ 126 . The numeric range for operands is shown in Table 1-15 on 
page 116. 

The PFRCP instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

RCPSS 


Mnemonic 

Opcode 

Description 

PFRCP mmxl, mmx2lmem64 

OF OF/r 
96 

Computes approximate reciprocal of single-precision 
floating-point value in an MMX register or 64-bit memory 
location and writes the result in both doublewords of the 
destination MMX register. 
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mmxl 


63 i 

r 32 

31 i 

' 0 









mmx2/mem64 

63 32 31 0 

approximate 

reciprocal 

pfrcp.eps 


Table 1-15. Numeric Range for the PFRCP Result 


Operand 

Source 1 and Destination 

Source 2 

0 

+/- Maximum Normal 1 

Normal 

Normal, +/- 0 Z 

Unsupported 3 

Undefined 

Note: 

1. The result has the same sign as the source operand. 

2. If the absolute value of the result is less than 2r 126 , the result is zero with the sign being the sign of the 
source operand. Otherwise, the result is a normal with the sign being the same sign as the source 
operand. 

3. “Unsupported” means that the exponent is all ones (Is). 


Examples 

The general Newton-Raphson recurrence for the reciprocal 1/b is: 

z i +l z i * (2 - b • Z ± ) 

The following code sequence shows the computation of a/b: 

X 0 = PFRCP (b) 

X-l = PFRCPIT1 (b, X 0 ) 

X 2 = PFRCPIT2(X lf X 0 ) 
q = PFMUL(a, X 2 ) 

The 24-bit final reciprocal value is X 2 . The quotient is formed in the last step by multiplying the 
reciprocal by the dividend a. 

Related Instructions 

PFRCPIT1, PFRCPIT2 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFRCPIT1 Packed Floating-Point Reciprocal Iteration 1 

Performs the first step in the Newton-Raphson iteration to refine the reciprocal approximation 
produced by the PFRCP instruction. The first source/destination operand is an MMX register 
containing the results of two previous PFRCP instructions, and the second source operand is another 
MMX register or 64-bit memory location containing the source operands from the same PFRCP 
instructions. 

This instruction is only defined for those combinations of operands such that the first source operand 
(mmxl) is the approximate reciprocal of the second source operand (mmx2/mem64), and thus the 
range of the product, mmxl * mmx2/mem64, is (0.5, 2). The initial approximation of an operand is 
accurate to about 12 bits, and the length of the operand itself is 24 bits, so the product of these two 
operands is greater than 24 bits. PFRCPIT1 applies the one's complement of the product and rounds 
the result to 32 bits. It then compresses the result to fit into 24 bits by removing the 8 redundant most- 
significant bits after the hidden integer bit. 

The estimate contains the correct round-to-nearest value for approximately 99% of all arguments. The 
remaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-the- 
last-place (ulp). For details, see the data sheet or other software-optimization documentation relating 
to particular hardware implementations. 

The PFRCPIT1 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended instruction Substitution 

PFRCP 


Mnemonic 


Opcode Description 


PFRCPIT1 mmxl, 
mmx2/mem64 


OF OF /r Refine approximate reciprocal of result from previous 

A6 PFRCP instruction. 
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mmxl 


mmx2/mem64 


63 + 32 31 + 0 


PFRCP Result PFRCP Result 


63 32 31 0 


PFRCP Source PFRCP Source 


Newton- 
Raphson 
reciprocal 
step 1 


Newton- 
Raphson 
reciprocal 
step 1 


pfrcpit1.eps 


Operation 

mmxl[31:0] = Compress (2 - mmxl[31:0] * (mmx2/mem64[31:0] ) - 2 31 ) ; 

mmxl[63:32] = Compress (2 - mmxl[63:32] * (mmx2/mem64[63:32]) - 2 31 ) ; 

where: 

“Compress” means discard the 8 redundant most-significant bits after the hidden integer bit. 

Examples 

The general Newton-Raphson recurrence for the reciprocal 1/b is: 

z i + l <- z i * (2 - b • Z ± ) 

The following code sequence computes a 24-bit approximation to a/b with one Newton-Raphson 
iteration: 

X 0 = PFRCP(b) 

X-L = PFRCPIT1 (b, X 0 ) 

X 2 = PFRCPIT2 (X-l, X 0 ) 
q = PFMUL(a, X 2 ) 

a/b is fonned in the last step by multiplying the reciprocal approximation by a. 

Related Instructions 

PFRCP, PFRCPIT2 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFRCPIT2 Packed Floating-Point Reciprocal or Reciprocal 

Square Root Iteration 2 

Performs the second and final step in the Newton-Raphson iteration to refine the reciprocal 
approximation produced by the PFRCP instruction or the reciprocal square-root approximation 
produced by the PFSQRT instruction. PFRCPIT2 takes two paired elements in each source operand. 
These paired elements are the results of a PFRCP and PFRCPIT1 instruction sequence or of a 
PFRSQRT and PFRSQIT1 instruction sequence. The first source/destination operand is an MMX 
register that contains the PFRCPIT1 or PFRSQIT1 results and the second source operand is another 
MMX register or 64-bit memory location that contains the PFRCP or PFRSQRT results. 

The PFRCPIT2 instruction expands the compressed PFRCPIT 1 or PFRSQIT 1 results from 24 to 32 
bits and multiplies them by their respective source operands. An optimal correction factor is added to 
the product, which is then rounded to 24 bits. 

The estimate contains the correct round-to-nearest value for approximately 99% of all arguments. The 
remaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-the- 
last-place (ulp). For details, see the data sheet or other software-optimization documentation relating 
to particular hardware implementations. 

The PFRCPIT2 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

PFRCP 


Mnemonic 


Opcode Description 


nF np . Refines approximate reciprocal result from previous 
PFRCPIT2 mmxl, mmx2lmem64 RK PFRCP and PFRCPIT1 instructions or from previous 

co PFRSQRT and PFRSQIT 1 instructions. 
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mmxl 


mmx2/mem64 


63 i 


32 31 + 0 


Iteration-1 Result 


Iteration-1 Result 


63 

32 31 0 

Reciprocal Result 

Reciprocal Result 


Newton- 

Raphson 


reciprocal 

cton 1 


Newton- 
Raphson 
reciprocal 
step 2 


pfrcpit2.eps 


Operation 

mmxl[31:0] = Expand(mmxl[31:0]) * mmx2/mem64[31:0] ; 
mmxl[63:32] = Expand(mmxl[63:32]) * mmx2/mem64[63:32]; 

where: 

“Expand” means convert a 24-bit significand to a 32-bit significand according to the following rule: 

temp[31:0] = {l'bl, 8{mmxl[22]}, mmxl[22:0]}; 

Examples 

The general Newton-Raphson recurrence for the reciprocal 1/b is: 

+1 <r- • (2 - b • Z ± ) 

The following code sequence computes a 24-bit approximation to a/b with one Newton-Raphson 
iteration: 

X 0 = PFRCP(b) 

X-l = PFRCPIT1 (b, X 0 ) 

X 2 = PFRCPIT2(X 1( X 0 ) 
q = PFMUL(a, X 2 ) 

a/b is fonned in the last step by multiplying the reciprocal approximation by a. 

Related Instructions 

PFRCP, PFRCPIT1, PFRSQRT, PFRSQIT1 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFRSQIT1 Packed Floating-Point Reciprocal Square Root 

Iteration 1 

Performs the first step in the Newton-Raphson iteration to refine the reciprocal square-root 
approximation produced by the PFSQRT instruction. The first source/destination operand is an MMX 
register containing the result from a previous PFRSQRT instruction, and the second source operand is 
another MMX register or 64-bit memory location containing the source operand from the same 
PFRSQRT instruction. 

This instruction is only defined for those combinations of operands such that the first source operand 
(mmxl) is the approximate reciprocal of the second source operand (mmx2/mem64), and thus the 
range of the product, mmxl * mmx2/mem64, is (0.5, 2). The length of both operands is 24 bits, so the 
product of these two operands is greater than 24 bits. The product is normalized and then rounded to 32 
bits. The one's complement of the result is applied, a 1 is added as the most-significant bit, and the 
result re-nonnalized. The result is then compressed to fit into 24 bits by removing 8 redundant most- 
significant bits after the hidden integer bit, and the exponent is reduced by 1 to account for the division 
by 2. 

The PFRSQIT1 instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

PFRSQRT 


Mnemonic 


Opcode Description 


PFRSQIT1 mmxl, 
mmx2lmem64 


OF OF /r Refines reciprocal square root approximation of previous 
A7 PFRSQRT instruction. 


124 


PFRSQIT1 


64-Bit Media 
Instruction Reference 



26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


mmxl 


mmx2/mem64 


63 i 


32 31 •* 


0 


PFSQRT Result 


PFSQRT Result 


63 32 31 0 


PFSQRT Source 


PFSQRT Source 


Newton- 
Raphson 
reciprocal 
square root 
step 1 


Newton- 
Raphson 
reciprocal 
square root 
step 1 


pfrsqit1.eps 


Operation 

mmxl[31:0] = Compress ((3 - mmxl[31:0] * (mmx2/mem64[31:0]) - 2 31 )/2); 

mmxl[63:32] = Compress ((3 - mmxl[63:32] * (mmx2/mem64[63:32]) - 2 31 )/2); 

where: 

“Compress” means discard the 8 redundant most-significant bits after the hidden integer bit. 

Examples 

The following code sequence shows how the PFRSQRT and PFMUL instructions can be used to 
compute a = 1/sqrt (b): 

X 0 = PFRSQRT(b) 

X-l = PFMUL (X0,X0) 

X 2 = PFRSQIT1(b,X x ) 

a = PFRCPIT2(X 2 ,X 0 ) 

Related Instructions 

PFRCPIT2, PFRSQRT 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFRSQRT Packed Floating-Point Reciprocal Square Root 

Approximation 

Computes the approximate reciprocal square root of the single-precision floating-point value in the 
low-order 32 bits of an MMX register or 64-bit memory location and writes the result in each 
doubleword of another MMX register. The source operand is single-precision with a 24-bit 
significand, and the result is accurate to 15 bits. Negative operands are treated as positive operands for 
purposes of reciprocal square-root computation, with the sign of the result the same as the sign of the 
source operand. 

This instruction can be used together with the PFRSQIT1 and PFRCPIT2 instructions to increase 
accuracy. The first stage of this refinement in accuracy (PFRSQIT1) requires that the input and output 
of the previously executed PFRSQRT instruction be used as input to the PFRSQIT1 instruction. 

The estimate contains the correct round-to-nearest value for approximately 99% of all arguments. The 
remaining arguments differ from the correct round-to-nearest value for the reciprocal by 1 unit-in-the- 
last-place (ulp). For details, see the data sheet or other software-optimization documentation relating 
to particular hardware implementations. 

The PFRSQRT instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

The numeric range for operands is shown in Table 1-16 on page 128. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

RSQRTSS 


Mnemonic 


Opcode Description 


PFRSQRT mmxl, 
mmx2/mem64 


OF OF /r Computes approximate reciprocal square root of a packed 

97 single-precision floating-point value. 
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mmxl 


63 


32 31 


mmx2/mem64 
63 32 31 0 

I 

reciprocal 
square root 

pfrsqrt.eps 


Table 1-16. Numeric Range for the PFRCP Result 


Operand 

Source 1 and Destination 

Source 2 

0 

+/- Maximum Normal 1 

Normal 

Normal 1 

Unsupported 2 

Undefined 1 

Note: 

1. The result has the same sign as the source operand. 

2. “Unsupported” means that the exponent is all ones (Is). 


Examples 

The following code sequence shows how the PFRSQRT and PFMUL instructions can be used to 
compute a = 1/sqrt (b): 

X 0 = PFRSQRT(b) 

X-L = PFMUL (X 0 ,X 0 ) 

X 2 = PFRSQIT1 (b.X-L) 
a = PFRCPIT2(X 2 ,X 0 ) 

Related Instructions 

PFRCPIT2, PFRSQIT1 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFSUB Packed Floating-Point Subtract 

Subtracts each packed single-precision floating-point value in the second source operand from the 
corresponding packed single-precision floating-point value in the first source operand and writes the 
result of each subtraction in the corresponding doubleword of the destination (first source). The first 
source/destination operand is an MMX register. The second source operand is another MMX register 
or 64-bit memory location. The numeric range for operands is shown in Table 1-17 on page 131. 

The PFSUB instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

SUBPS 

Description 

Subtracts packed single-precision floating-point values in 
an MMX register or 64-bit memory location from packed 
single-precision floating-point values in another MMX 
register and writes the result in the destination MMX 
register. 


mmxl mmx2/mem64 
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Table 1-17. Numeric Range for the PFSUB Results 


Source Operand 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+/-0 1 

- Source 2 

- Source 2 

Normal 

Source 1 

Normal, +/- 0 Z 

Undefined 

Unsupported 3 

Source 1 

Undefined 

Undefined 


Note: 


1. The sign of the result is the logical AND of the sign of source 1 and the inverse of the sign of source 2. 

2. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is a zero. 
If the source operand that is larger in magnitude is source 1, the sign of this zero is the same as the sign 
of source 1, else it is the inverse ofthe sign ofsource2. If the infinitely precise result is exactly zero, the 
result is zero with the sign of source 1. If the absolute value ofthe infinitely precise result is greater than 
or equal to 2 128 , the result is the largest normal number with the sign of source 1. 

3. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFSUBR 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowi™ instructions are not supported, 
as indicated by CPU ID Fn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution ofthe 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PFSUBR Packed Floating-Point Subtract Reverse 

Subtracts each packed single-precision floating-point value in the first source operand from the 
corresponding packed single-precision floating-point value in the second source operand and writes 
the result of each subtraction in the corresponding dword of the destination (first source). The first 
source/destination operand is an MMX register. The second source operand is another MMX register 
or 64-bit memory location. The numeric range for operands is shown in Table 1-18 on page 133. 

The PFSUBR instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

SUBPS 


Mnemonic 


Opcode 


PFSUBR mmxl, mmx2lmem64 


OF OF/r 
AA 


Description 

Subtracts packed single-precision floating-point values in 
an MMX register from packed single-precision floating¬ 
point values in another MMX register or 64-bit memory 
location and writes the result in the destination MMX 
register. 


mmxl 


mmx2/mem64 


63 


32 31 


63 32 31 0 


subtract 

-subtract 


pfsubr.eps 
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Table 1-18. Numeric Range for the PFSUBR Results 


Source Operand 

Source 2 

0 

Normal 

Unsupported 

Source 1 and 
Destination 

0 

+/-0 1 

Source 2 

Source 2 

Normal 

- Source 1 

Normal, +/- 0 1 

Undefined 

Unsupported 3 

- Source 1 

Undefined 

Undefined 


Note: 


1. The sign is the logical AND of the sign of source 2 and the inverse of the sign of source 1. 

2. If the absolute value of the infinitely precise result is less than 2~ 126 (but not zero), the result is a zero. 
If the source operand that is larger in magnitude is source 2, the sign of this zero is the same as the sign 
ofsource2, else it is the inverse ofthe sign of source 1. If the infinitely precise result is exactly zero, the 
result is zero with the sign of source 2. If the absolute value ofthe infinitely precise result is greater than 
or equal to 2 128 , the result is the largest normal number with the sign of source 2. 

3. “Unsupported” means that the exponent is all ones (Is). 


Related Instructions 

PFSUB 

rFLAGS Affected 

None 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowi™ instructions are not supported, 
as indicated by ECPUID Fn8000 0001 EDX[3DNow] 

= 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution ofthe 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PI2FD Packed Integer to Floating-Point Doubleword 

Conversion 

Converts two packed 32-bit signed integer values in an MMX register or a 64-bit memory location to 
two packed single-precision floating-point values and writes the converted values in another MMX 
register. If the result of the conversion is an inexact value, the value is truncated (rounded toward 
zero). 

The PI2FD instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 

AMD no longer recommends the use of 3DNow! instructions, which have been superceded by their 
more efficient 128-bit media counterparts. For a complete list of recommended instruction 
substitutions, see Appendix A, “Recommended Substitutions for 3DNow!™ Instructions” on 
page 337. 

Recommended Instruction Substitution 

CVTDQ2PS 

Description 

Converts packed doubleword integers in an MMX register or 64- 
bit memory location to single-precision floating-point values in 
the destination MMX register. Inexact results are truncated. 


mmx2/mem64 


63 32 31 0 


convert 


pi2fd.eps 


convert 


Mnemonic 


Opcode 


PI2FD mmxl, 
mmx2/mem64 


OF OF/r 
OD 


mmxl 


63 


32 31 


Related Instructions 

PF2ID, PF2IW, PI2FW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUID Fn8000 0001 EDX[3DNow] = 

0 . 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PI2FW Packed Integer to Floating-Point Word Conversion 

Converts two packed 16-bit signed integer values in an MMX register or a 64-bit memory location to 
two packed single-precision floating-point values and writes the converted values in another MMX 
register. 

The PI2FW instruction is an extension to the AMD 3DNow!™ instruction set. The presence of this 
instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information 
about the CPUID instruction. 


Mnemonic 

PI2FW mmxl, 
mmx2lmem64 


mmx2/mem64 

63 48 47 32 31 16 15 0 


convert | 

! convert 

pi2fw.eps 


mmxl 


63 32 31 


Opcode Description 

. Converts packed 16-bit integers in an MMX register or 64-bit 
memory location to packed single-precision floating-point 
values in the destination MMX register. 


Related Instructions 

PF2ID, PF2IW, PI2FD 


136 


PI2FW 


64-Bit Media 
Instruction Reference 




26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD extensions to 3DNowl™ are not supported, 
as indicated by 

CPUID Fn8000_0001_EDX[3DNowExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PINSRW Packed Insert Word 

Inserts a 16-bit value from the low-order word of a 32-bit general purpose register or a 16-bit memory 
location into an MMX register. The location in the destination register is selected by the immediate 
byte operand, a shown in Table 1-19. The other words in the destination register operand are not 
modified. 

The PINSRW instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode 


Description 


PINSRW mmx, reg32lmem16, 
imm8 


OF C4 /rib 


Inserts a 16-bit value from a general-purpose 
register or memory location into an MMX 
register. 


mmx 


63 48 47 32 31 16 15 0 


imm8 

7 0 

□ 

i 

select word position for insert 


reg32/meml6 

31 15 0 


pinsrw-64.eps 


Table 1-19. Immediate-Byte Operand Encoding for 64-Bit PINSRW 


Immediate-Byte 

Bit Field 

Value of Bit Field 

Destination Bits Filled 

1-0 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 


Related Instructions 

PEXTRW 


rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMADDWD Packed Multiply Words and Add Doublewords 

Multiplies each packed 16-bit signed value in the first source operand by the corresponding packed 16- 
bit signed value in the second source operand, adds the adjacent intermediate 32-bit results of each 
multiplication (for example, the multiplication results for the adjacent bit fields 63-48 and 47-32, and 
31-16 and 15-0), and writes the 32-bit result of each addition in the corresponding doubleword of the 
destination (first source). The first source/destination operand is an MMX register and the second 
source operand is another MMX register or 64-bit memory location. 

If all four of the 16-bit source operands used to produce a 32-bit multiply-add result have the value 
8000h, the 32-bit result is 8000_0000h, which is not the correct 32-bit signed result. 

The PMADDWD instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic Opcode 

PMADDWD mmxl, mmx2lmem64 OF F5/r 


Description 

Multiplies four packed 16-bit signed values in an 
MMX register and another MMX register or 64-bit 
memory location, adds intermediate results, and 
writes the result in the destination MMX register. 


mmxl mmx2/mem64 



Related Instructions 

PMULHUW, PMULHW, PMULLW, PMULUDQ 

rFLAGS Affected 
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AMD64 Technology 


None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMAXSW Packed Maximum Signed Words 

Compares each of the packed 16-bit signed integer values in the first source operand with the 
corresponding packed 16-bit signed integer value in the second source operand and writes the 
maximum of the two values for each comparison in the corresponding word of the destination (first 
source). The first source/destination and second source operands are an MMX register and an MMX 
register or 64-bit memory location. 

The PMAXSW instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


PMAXSW mmxl, mmx2/mem64 OF EE /r 


Compares packed signed 16-bit integer values in an MMX 
register and another MMX register or 64-bit memory 
location and writes the maximum value of each compare 
in destination MMX register. 


mmxl mmx2/mem64 


T T 

63 48 47 32 31 16 15 0 63 48 47 32 31 16 15 0 


maximum -1-1 

_I maximum - 

_| pmaxsw-64.eps 


Related Instructions 

PMAXUB, PMINSW, PMINUB 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMAXUB Packed Maximum Unsigned Bytes 

Compares each of the packed 8-bit unsigned integer values in the first source operand with the 
corresponding packed 8-bit unsigned integer value in the second source operand and writes the 
maximum of the two values for each comparison in the corresponding byte of the destination (first 
source). The first source/destination and second source operands are an MMX register and an MMX 
register or 64-bit memory location. 

The PMAXUB instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode 


PMAXUB mmxl, mmx2lmem64 OF DE /r 


Description 

Compares packed unsigned 8-bit integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the maximum value of each 
compare in the destination MMX register. 


mmx2/mem64 


63 


pmaxub-64.eps 


mmxl 


63 


maximum 


maximum 


Related Instructions 

PMAXSW, PMINSW, PMINUB 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMINSW Packed Minimum Signed Words 

Compares each of the packed 16-bit signed integer values in the first source operand with the 
corresponding packed 16-bit signed integer value in the second source operand and writes the 
minimum of the two values for each comparison in the corresponding word of the destination (first 
source). The first source/destination and second source operands are an MMX register and an MMX 
register or 64-bit memory location. 

The PMINSW instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode 


PMINSW mmxl, mmx2/mem64 OF EA /r 


Description 

Compares packed signed 16-bit integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the minimum value of each 
compare in the destination MMX register. 


mmxl 


63 48 47 32 31 16 15 0 


minimum 


minimum 


mmx2/mem64 


63 48 47 32 31 16 15 0 


pminsw-64.eps 


Related Instructions 

PMAXSW, PMAXUB, PMINUB 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMINUB Packed Minimum Unsigned Bytes 

Compares each of the packed 8-bit unsigned integer values in the first source operand with the 
corresponding packed 8-bit unsigned integer value in the second source operand and writes the 
minimum of the two values for each comparison in the corresponding byte of the destination (first 
source). The first source/destination operand is an MMX register and the second source operand is 
another MMX register or 64-bit memory location. 

The PMINUB instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode 


PMINUB mmxl, mmx2/mem64 OF DA /r 


Description 

Compares packed unsigned 8-bit integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the minimum value of each 
comparison in the destination MMX register. 


mmxl 


mmx2/mem64 



Related Instructions 

PMAXSW, PMAXUB, PMINSW 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMOVMSKB Packed Move Mask Byte 

Moves the most-significant bit of each byte in the source operand in bitwise order to the low order byte 
of the destination operand. The upper 24 bits of the destination operand are cleared to zeros. The 
destination operand is a 32-bit general-purpose register and the source operand is an MMX register. 

The PMOVMSKB instruction is an AMD extension to MMX™ instruction set and is an SSE1 
instruction. The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


Description 


PMOVMSKB reg32, mmx OF D7 /r 


Moves most-significant bit of each byte in an MMX register 
to the low-order byte of a 32-bit general-purpose register. 


reg32 


31 


V 4 

7 0 


mmx 


63 55 47 39 31 23 15 7 0 


0 



n 

n 

n 

n 

n 

n 

n 

□ 


copy 


copy 


pmovmskb-64.eps 


Related Instructions 

MOVMSKPD, MOVMSKPS 

rFLAGS Affected 

None 
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AMDS 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as indicated 
by CPUID Fn0000_0001_EDX[SSE] = 0 and the AMD 
extensions to the MMX™ instruction set are not 
supported, as indicated by CPUID 
Fn8000_0001_EDX[MmxExt] = 0. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 
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PMULHRW Packed Multiply High Rounded Word 

Multiplies each of the four packed 16-bit signed integer values in the first source operand by the 
corresponding packed 16-bit integer value in the second source operand, adds 8000h to the lower 16 
bits of the intermediate 32-bit result of each multiplication, and writes the high-order 16 bits of each 
result in the corresponding word of the destination (first source). The addition of 8000h results in the 
rounding of the result, providing a numerically more accurate result than the PMULHW instruction, 
which truncates the result. The first source/destination operand is an MMX register. The second source 
operand is another MMX register or 64-bit memory location. 

The PMULHRW instruction is an AMD 3DNow!™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


Opcode Description 


PMULHRW mmxl, mmx2/mem64 


OF OF/r 
B7 


Multiply 16-bit signed integer values in an MMX register 
and another MMX register or 64-bit memory location and 
write rounded result in the destination MMX register. 


mmxl 


63 48 47 32 31 16 15 0 


multiply 

I 

round 


multiply 

I 

round 


mmx2/mem64 

63 48 47 32 31 16 15 0 


pmulhrw.eps 


Related Instructions 

None 

rFLAGS Affected 

None 
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AMDS 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD 3DNowl™ instructions are not supported, 
as indicated by CPUIDFn8000 0001 EDX[3DNow] = 
0 . 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMULHUW Packed Multiply High Unsigned Word 

Multiplies each packed unsigned 16-bit values in the first source operand by the corresponding packed 
unsigned word in the second source operand and writes the high-order 16 bits of each intermediate 32- 
bit result in the corresponding word of the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

The PMULHUW instruction is an AMD extension to MMX™ instruction set and is an SSE1 
instruction. The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in 
Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PMULHUW mmxl, mmx2/mem64 OF E4 /r 


Description 

Multiplies packed 16-bit values in an MMX register 
by the packed 16-bit values in another MMX register 
or 64-bit memory location and writes the high-order 
16 bits of each result in the destination MMX 
register. 


mmxl mmx2/mem64 

■ ■ V 

63 48 47 32 31 16 15 0 63 48 47 32 31 16 15 0 


multiply-|-1 

—I multiply- 

-1 pmulhuw-64.eps 


Related Instructions 

PMADDWD, PMULHW, PMULLW, PMULUDQ 

rFLAGS Affected 

None 
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AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMULHW Packed Multiply High Signed Word 

Multiplies each packed 16-bit signed integer value in the first source operand by the corresponding 
packed 16-bit signed integer in the second source operand and writes the high-order 16 bits of the 
intermediate 32-bit result of each multiplication in the corresponding word of the destination (first 
source). The first source/destination operand is an MMX register and the second source operand is 
another MMX register or 64-bit memory location. 

The PMULHW instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PMULHW mmxl, mmx2lmem64 OF E5 /r 


Description 

Multiplies packed 16-bit signed integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the high-order 16 bits of 
each result in the destination MMX register. 


mmxl mmx2/mem64 



Related Instructions 

PMADDWD, PMULHUW, PMULLW, PMULUDQ 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMULLW Packed Multiply Low Signed Word 

Multiplies each packed 16-bit signed integer value in the first source operand by the corresponding 
packed 16-bit signed integer in the second source operand and writes the low-order 16 bits of the 
intermediate 32-bit result of each multiplication in the corresponding word of the destination (first 
source). The first source/destination operand is an MMX register and the second source operand is 
another MMX register or 64-bit memory location. 

The PMULLW instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PMULLW mmxl, mmx2/mem64 OF D5 /r 


Description 

Multiplies packed 16-bit signed integer values in an 
MMX register and another MMX register or 64-bit 
memory location and writes the low-order 16 bits of 
each result in the destination MMX register. 


mmxl mmx2/mem64 

1 . 

63 48 47 32 31 16 15 0 63 48 47 32 31 16 15 0 


multiply-1-1 

—I multiply- 

_I pmullw-64.eps 


Related Instructions 

PMADDWD, PMULHUW, PMULHW, PMULUDQ 

rFLAGS Affected 

None 


158 


PMULLW 


64-Bit Media 
Instruction Reference 





26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PMULUDQ Packed Multiply Unsigned Doubleword and Store 

Quadword 

Multiplies two 32-bit unsigned integer values in the low-order doubleword of the first and second 
source operands and writes the 64-bit result in the destination (first source). The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

The PMULUDQ instruction is an SSE2 instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PMULUDQ mmxl, mmx2/mem64 OF F4 /r 


Description 

Multiplies low-order 32-bit unsigned integer value in 
an MMX register and another MMX register or 64-bit 
memory location and writes the 64-bit result in the 
destination MMX register. 


mmxl mmx2/mem64 



Related Instructions 

PMADDWD, PMULHUW, PMULHW, PMULLW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE2 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE2] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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POR Packed Logical Bitwise OR 

Performs a bitwise logical OR of the values in the first and second source operands and writes the 
result in the destination (first source). The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The POR instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


POR mmxl, mmx2/mem64 OF EB /r 


Description 

Performs bitwise logical OR of values in an MMX register 
and in another MMX register or 64-bit memory location and 
writes the result in the destination MMX register. 


mmxl 


63 r o 


OR 


mmx2/mem64 


63 


0 


por-64.eps 


Related Instructions 

PAND, PANDN, PXOR 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSADBW Packed Sum of Absolute Differences of Bytes Into 

a Word 

Computes the absolute differences of eight corresponding packed 8-bit unsigned integers in the first 
and second source operands and writes the unsigned 16-bit integer result of the sum of the eight 
differences in a word in the destination (first source). The first source/destination operand is an MMX 
register and the second source operand is another MMX register or 64-bit memory location. The result 
is stored in the low-order word of the destination operand, and the remaining bytes in the destination 
are cleared to all Os. 

The PSADBW instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode 


PSADBW mmxl, mmx2/mem64 OF F6 /r 


Description 

Compute the sum of the absolute differences of 
packed 8-bit unsigned integer values in an MMX 
register and another MMX register or 64-bit memory 
location and writes the 16-bit unsigned integer result in 
the destination MMX register. 


mmxl 


mmx2/mem64 



63 0 


absolute 

difference 


_absolute_ 

difference 

j add 8 

pairs 

63 15^0 


psadbw-64.eps 


rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSHUFW Packed Shuffle Words 

Moves any one of the four packed words in an MMX register or 64-bit memory location to a specified 
word location in another MMX register. In each case, the selection of the value of the destination word 
is determined by a two-bit field in the immediate-byte operand, with bits 0 and 1 selecting the contents 
of the low-order word, bits 2 and 3 selecting the second word, bits 4 and 5 selecting the third word, and 
bits 6 and 7 selecting the high-order word. Refer to Table 1-20 on page 167. A word in the source 
operand may be copied to more than one word in the destination. 

The PSHUFW instruction is an AMD extension to MMX™ instruction set and is an SSE1 instruction. 
The presence of this instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for 
more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


PSHUFW mmxl, mmx2/mem64, 
imm8 


OF 70 /rib 


Shuffles packed 16-bit values in an MMX 
register or 64-bit memory location and puts the 
result in another MMX register. 


mmxl mmx2/mem64 



pshufw.eps 
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Table 1-20. Immediate-Byte Operand Encoding for PSHUFW 


Destination Bits Filled 

Immediate-Byte 

Bit Field 

Value of Bit Field 

Source Bits Moved 

15-0 

1-0 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 

31-16 

3-2 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 

47-32 

5-4 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 

63-48 

7-6 

0 

15-0 

1 

31-16 

2 

47-32 

3 

63-48 


Related Instructions 

PSHUFD, PSHUFHW, PSHUFLW 


rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE1 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE] = 0 
and the AMD extensions to the MMX™ instruction set 
are not supported, as indicated by CPUID 

F n8000_0001 _E DX[M mxExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSLLD Packed Shift Left Logical Doublewords 

Left-shifts each of the packed 32-bit values in the first source operand by the number of bits specified 
in the second source operand and writes each shifted value in the corresponding doubleword of the 
destination (first source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 31, the destination is cleared to all Os. 

The PSLLD instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PSLLD mmxl, mmx2/mem64 OF F2 /r 

PSLLD mmx, imm8 OF 72 16 ib 


Description 

Left-shifts packed doublewords in an MMX register 
by the amount specified in an MMX register or 64-bit 
memory location. 

Left-shifts packed doublewords in an MMX register 
by the amount specified in an immediate byte value. 


mmxl 

63 1 32 31 i 0 


shift left < -1- 

—I shift left * 


mmx2/mem64 


63 


mmx 


63 


32 31 


shift left ■* - 

-1 shift left ■* 


imm8 


7 0 

□ 


pslld-64.eps 


Related Instructions 

PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW 
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rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSLLQ Packed Shift Left Logical Quadwords 

Left-shifts each 64-bit value in the first source operand by the number of bits specified in the second 
source operand and writes each shifted value in the corresponding quadword of the destination (first 
source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 63, the destination is cleared to all Os. 

The PSLLQ instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSLLQ mmxl, mmx2/mem64 OF F3 /r 

PSLLQ mmx, imm8 OF 73 /6 ib 


Description 

Left-shifts quadword in an MMX register by the 
amount specified in an MMX register or 64-bit 
memory location. 

Left-shifts quadword in an MMX register by the 
amount specified in an immediate byte value. 


mmxl 

63 i 0 


shift left -*■ 


mmx2/mem64 



mmx 

63 1 0 


shift left 


imm8 


7 0 

□ 


psllq-64.eps 


Related Instructions 

PSLLD, PSLLDQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSLLW Packed Shift Left Logical Words 

Left-shifts each of the packed 16-bit values in the first source operand by the number of bits specified 
in the second source operand and writes each shifted value in the corresponding word of the 
destination (first source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The low-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 15, the destination is cleared to all Os. 

The PSLLW instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSLLW mmxl, mmx2/mem64 OF FI /r 

PSLLW mmx, imm8 OF 71 16 ib 


Description 

Left-shifts packed words in an MMX register by the 
amount specified in an MMX register or 64-bit 
memory location. 

Left-shifts packed words in an MMX register by the 
amount specified in an immediate byte value. 


mmxl 


63 48 47 32 31 16 15 


shift left ◄- 

-1 shift left « 




mmx2/mem64 


63 


mmx 


63 48 47 32 31 16 15 0 


shift left •«- 


imm8 


7 0 

□ 




shift 

left < - 


psllw-64.eps 


Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW 
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rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSRAD Packed Shift Right Arithmetic Doublewords 

Right-shifts each of the packed 32-bit values in the first source operand by the number of bits specified 
in the second source operand and writes each shifted value in the corresponding doubleword of the 
destination (first source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The high-order bits that are emptied by the shift operation are filled with the sign bit of the 
doubleword’s initial value. If the shift value is greater than 31, each doubleword in the destination is 
filled with the sign bit of the doubleword’s initial value. 

The PSRAD instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSRAD mmxl, mmx2lmem64 OF E2 /r 

PSRAD mmx, imm8 OF 72 /4 ib 


Description 

Right-shifts packed doublewords in an MMX register 
by the amount specified in an MMX register or 64-bit 
memory location. 

Right-shifts packed doublewords in an MMX register 
by the amount specified in an immediate byte value. 


mmxl 


63 ▼ 32 31 ▼ 0 


shift right ◄-1- 

-1 shift right 


mmx2/mem64 



mmx 


imm8 



32 31 ▼ 


0 


7 0 

□ 


shift right < -1- 

-1 shift right ■* - 

_| psrad-64.eps 


64-Bit Media 
Instruction Reference 


PSRAD 


175 




AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAW, PSRLD, PSRLDQ, PSRLQ, PSRLW 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSRAW Packed Shift Right Arithmetic Words 

Right-shifts each of the packed 16-bit values in the first source operand by the number of bits specified 
in the second source operand and writes each shifted value in the corresponding word of the 
destination (first source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The high-order bits that are emptied by the shift operation are filled with the sign bit of the word’s 
initial value. If the shift value is greater than 15, each word in the destination is filled with the sign bit 
of the word’s initial value. 

The PSRAW instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSRAW mmxl, mmx2/mem64 OF El /r 

PSRAW mmx, imm8 OF 71/4/6 


Description 

Right-shifts packed words in an MMX register by the 
amount specified in an MMX register or 64-bit 
memory location. 

Right-shifts packed words in an MMX register by the 
amount specified in an immediate byte value. 


mmxl mmx2/mem64 

1 ■ ■ V 

63 48 47 32 31 16 15 0 63 0 


shift right ^ _ 

arithmetic shift right + 

-1 arithmetic 


mmx 

■ ■ V 

63 48 47 32 31 16 15 0 


shift right M -1- 

arithmetic shift right 

- ' arithmetic 


imm8 

7 0 

□ 

psraw-64.eps 


64-Bit Media 
Instruction Reference 


PSRAW 


177 






AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRLD, PSRLDQ, PSRLQ, PSRLW 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSRLD Packed Shift Right Logical Doublewords 

Right-shifts each of the packed 32-bit values in the first source operand by the number of bits specified 
in the second source operand and writes each shifted value in the corresponding doubleword of the 
destination (first source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 31, the destination is cleared to 0. 

The PSRLD instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSRLD mmxl, mmx2/mem64 OF D2 /r 

PSRLD mmx, imm8 OF 72 12 ib 


Description 

Right-shifts packed doublewords in an MMX register 
by the amount specified in an MMX register or 64-bit 
memory location. 

Right-shifts packed doublewords in an MMX register 
by the amount specified in an immediate byte value. 


mmxl 


63 ▼ 32 31 ▼ 0 


shift right ■«-1- 

-1 shift right 


mmx2/mem64 


63 


63 ▼ 


mmx 


32 31 ▼ 


0 


shift right -<-1- 

—1 shift right <- 


imm8 

7 0 

□ 


psrld-64.eps 
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Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLDQ, PSRLQ, PSRLW 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSRLQ Packed Shift Right Logical Quadwords 

Right-shifts each 64-bit value in the first source operand by the number of bits specified in the second 
source operand and writes each shifted value in the corresponding quadword of the destination (first 
source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 63, the destination is cleared to 0. 

The PSRLQ instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PSRLQ mmxl, mmx2/mem64 OF D3 /r 
PSRLQ mmx, imm8 OF 73 12 ib 


Description 

Right-shifts quadword in an MMX register by the 
amount specified in an MMX register or 64-bit memory 
location. 

Right-shifts quadword in an MMX register by the 
amount specified in an immediate byte value. 


mmxl 


63 ▼ 0 


shift right -*■ 


mmx2/mem64 



mmx imm8 


63 T 0 7 0 

I □ 

I j 

shift right a -1 

-1 psrlq-64.eps 


Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLW 


64-Bit Media 
Instruction Reference 


PSRLQ 


181 




AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSRLW Packed Shift Right Logical Words 

Right-shifts each of the packed 16-bit values in the first source operand by the number of bits specified 
in the second operand and writes each shifted value in the corresponding word of the destination (first 
source). The first source/destination and second source operands are: 

• an MMX register and another MMX register or 64-bit memory location, or 

• an MMX register and an immediate byte value. 

The high-order bits that are emptied by the shift operation are cleared to 0. If the shift value is greater 
than 15, the destination is cleared to 0. 

The PSRLW instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic Opcode 

PSRLW mmxl, mmx2/mem64 OF D1 /r 

PSRLW mmx, imm8 OF 71 12 ib 


Description 

Right-shifts packed words in an MMX register by the 
amount specified in an MMX register or 64-bit 
memory location. 

Right-shifts packed words in an MMX register by the 
amount specified in an immediate byte value. 


_ mmxl _ 

1 ■ ■ V 

63 48 47 32 31 16 15 0 


shift right ◄-1- 

-1 shift right * 


mmx 


63 48 47 32 31 16 15 


shift right 4-1- 

-1 shift right -*■ 


mmx2/mem64 


63 


imm8 

7 0 

□ 

psrlw-64.eps 
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Related Instructions 

PSLLD, PSLLDQ, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD, PSRLDQ, PSRLQ 

rFLAGS Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBB Packed Subtract Bytes 

Subtracts each packed 8-bit integer value in the second source operand from the corresponding packed 
8-bit integer in the first source operand and writes the integer result of each subtraction in the 
corresponding byte of the destination (first source). The first source/destination operand is an MMX 
register and the second source operand is another MMX register or 64-bit memory location. 

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 8 bits of each 
result are written in the destination. 

The PSUBB instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBB mmxl, mmx2lmem64 OF F8 /r 


Description 

Subtracts packed byte integer values in an MMX register 
or 64-bit memory location from packed byte integer 
values in another MMX register and writes the result in 
the destination MMX register. 


mmxl 


63 


mmx2/mem64 


63 


subtract 





subl 

tract - 


psubb-64.eps 


Related Instructions 

PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBD Packed Subtract Doublewords 

Subtracts each packed 32-bit integer value in the second source operand from the corresponding 
packed 32-bit integer in the first source operand and writes the integer result of each subtraction in the 
corresponding doubleword of the destination (first source). The first source/destination operand is an 
MMX register and the second source operand is another MMX register or 64-bit memory location. 

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 32 bits of each 
result are written in the destination. 

The PSUBD instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBD mmxl, mmx2lmem64 OF FA /r 


Description 

Subtracts packed 32-bit integer values in an MMX 
register or 64-bit memory location from packed 32-bit 
integer values in another MMX register and writes the 
result in the destination MMX register. 


mmxl 


63 


32 31 


subtract - 

-1 subtract 


mmx2/mem64 


63 32 31 0 


psubd-64.eps 


Related Instructions 

PSUBB, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBQ Packed Subtract Quadword 

Subtracts each packed 64-bit integer value in the second source operand from the corresponding 
packed 64-bit integer in the first source operand and writes the integer result of each subtraction in the 
corresponding quadword of the destination (first source). The first source/destination and source 
operands are an MMX register and another MMX register or 64-bit memory location. 

The PSUBQ instruction is an SSE2 instruction; check the status of EDX bit 26 returned by CPUID 
function OOOOOOOlh. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 64 bits of each 
result are written in the destination. 


Mnemonic 


Opcode 


PSUBQ mmxl, mmx2lmem64 OF FB /r 


Description 

Subtracts packed 64-bit integer values in an MMX 
register or 64-bit memory location from packed 64-bit 
integer values in another MMX register and writes the 
result in the destination MMX register. 


mmxl 


mmx2/mem64 


63 T 0 


subtract 


63 


o 


psubq-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The SSE2 instructions are not supported, as 
indicated by CPUID Fn0000_0001_EDX[SSE2] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBSB Packed Subtract Signed With Saturation Bytes 

Subtracts each packed 8-bit signed integer value in the second source operand from the corresponding 
packed 8-bit signed integer in the first source operand and writes the signed integer result of each 
subtraction in the corresponding byte of the destination (first source). The first source/destination 
operand is an MMX register and the second source operand is another MMX register or 64-bit 
memory location. 

For each packed value in the destination, if the value is larger than the largest signed 8-bit integer, it is 
saturated to 7Fh, and if the value is smaller than the smallest signed 8-bit integer, it is saturated to 80h. 

The PSUBBSB instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBSB mmxl, mmx2lmem64 OF E8 /r 


Description 

Subtracts packed byte signed integer values in an 
MMX register or 64-bit memory location from packed 
byte integer values in another MMX register and writes 
the result in the destination MMX register. 


mmxl 


63 


subtract 

I 

saturate 

_I 


subtract 

I 

saturate 


mmx2/mem64 


63 


0 


psubsb-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBQ, PSUBSW, PSUBUSB, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBSW Packed Subtract Signed With Saturation Words 

Subtracts each packed 16-bit signed integer value in the second source operand from the 
corresponding packed 16-bit signed integer in the first source operand and writes the signed integer 
result of each subtraction in the corresponding word of the destination (first source). The first 
source/destination and source operands are an MMX register and another MMX register or 64-bit 
memory location. 

For each packed value in the destination, if the value is larger than the largest signed 16-bit integer, it is 
saturated to 7FFFh, and if the value is smaller than the smallest signed 16-bit integer, it is saturated to 
8000h. 

The PSUBSW instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBSW mmxl, mmx2/mem64 OF E9 /r 


Description 

Subtracts packed 16-bit signed integer values in an 
MMX register or 64-bit memory location from packed 
16-bit integer values in another MMX register and 
writes the result in the destination MMX register. 


mmxl 


63 48 47 32 31 16 15 


subtract 

I 

saturate 


subtract 

I 

saturate 


mmx2/mem64 

63 48 47 32 31 16 15 0 


psubsw-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBUSB, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBUSB Packed Subtract Unsigned and Saturate Bytes 

Subtracts each packed 8-bit unsigned integer value in the second source operand from the 
corresponding packed 8-bit unsigned integer in the first source operand and writes the unsigned 
integer result of each subtraction in the corresponding byte of the destination (first source). The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

For each packed value in the destination, if the value is smaller than the smallest unsigned 8-bit 
integer, it is saturated to OOh. 

The PSUBUSB instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBUSB mmxl, mmx2/mem64 OF D8 /r 


Description 

Subtracts packed byte unsigned integer values in an 
MMX register or 64-bit memory location from packed 
byte integer values in another MMX register and 
writes the result in the destination MMX register. 


mmxl 


mmx2/mem64 


63 


63 


subtract 

I 

saturate 





subl 

tract - 


l 

saturate 


psubusb-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSW, PSUBW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBUSW Packed Subtract Unsigned and Saturate Words 

Subtracts each packed 16-bit unsigned integer value in the second source operand from the 
corresponding packed 16-bit unsigned integer in the first source operand and writes the unsigned 
integer result of each subtraction in the corresponding word of the destination (first source). The first 
source/destination operand is an MMX register and the second source operand is another MMX 
register or 64-bit memory location. 

For each packed value in the destination, if the value is smaller than the smallest unsigned 16-bit 
integer, it is saturated to OOOOh. 

The PSUBUSW instruction is an MMX™ instruction. The presence of this instruction set is indicated 
by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic Opcode 

PSUBUSW mmxl, mmx2lmem64 OF D9 /r 


Description 

Subtracts packed 16-bit unsigned integer values in 
an MMX register or 64-bit memory location from 
packed 16-bit integer values in another MMX register 
and writes the result in the destination MMX register. 


mmxl 


63 48 47 32 31 16 15 


subtract 

I 

saturate 


subtract 

l 

saturate 


mmx2/mem64 

63 48 47 32 31 16 15 0 


psubusw-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBW 

rFLAGS Affected 

None 


64-Bit Media 
Instruction Reference 


PSUBUSW 


197 





AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSUBW Packed Subtract Words 

Subtracts each packed 16-bit integer value in the second source operand from the corresponding 
packed 16-bit integer in the first source operand and writes the integer result of each subtraction in the 
corresponding word of the destination (first source). The first source/destination operand is an MMX 
register and the second source operand is another MMX register or 64-bit memory location. 

This instruction operates on both signed and unsigned integers. If the result overflows, the carry is 
ignored (neither the overflow nor carry bit in rFLAGS is set), and only the low-order 16 bits of the 
result are written in the destination. 

The PSUBW instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more infonnation about the CPUID instruction. 


Mnemonic 


Opcode 


PSUBW mmxl, mmx2/mem64 OF F9 /r 


Description 

Subtracts packed 16-bit integer values in an MMX 
register or 64-bit memory location from packed 16-bit 
integer values in another MMX register and writes the 
result in the destination MMX register. 


mmxl 


1 ■ ■ 1 

63 48 47 32 31 16 15 0 


subtract -1— 

_I subtract 


mmx2/mem64 

63 48 47 32 31 16 15 0 


psubw-64.eps 


Related Instructions 

PSUBB, PSUBD, PSUBQ, PSUBSB, PSUBSW, PSUBUSB, PSUBUSW 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PSWAPD Packed Swap Doubleword 

Swaps (reverses) the two packed 32-bit values in the source operand and writes each swapped value in 
the corresponding doubleword of the destination. The source operand is an MMX register or 64-bit 
memory location. The destination is another MMX register. 

The PSWAPD instruction is an extension to the AMD 3DNow!™ instruction set. The presence of this 
instruction set is indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information 
about the CPUID instruction. 


Mnemonic 


Opcode 


PSWAPD mmxl, mmx2lmem64 OF OF /r BB 


Description 

Swaps packed 32-bit values in an MMX register or 64- 
bit memory location and writes each value in the 
destination MMX register. 


mmxl 


1 

63 32 31 0 


mmx2/mem64 


63 32 31 0 


copy 



copy 


63 32 31 0 


pswapd.eps 


Related Instructions 

None 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The AMD Extensions to 3DNowl™ are not supported, 
as indicated by 

CPUID Fn8000_0001_EDX[3DNowExt] = 0. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKHBW Unpack and Interleave High Bytes 

Unpacks the high-order bytes from the first and second source operands and packs them into 
interleaved-byte words in the destination (first source). The low-order bytes of the source operands are 
ignored. The first source/destination operand is an MMX register and the second source operand is 
another MMX register or 64-bit memory location. 

If the second source operand is all Os, the destination contains the bytes from the first source operand 
zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16- 
bit operands for subsequent processing that requires higher precision. 

The PUNPCKHBW instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


PUNPCKHBW mmxl, 
mmx2/mem64 


Opcode Description 

Unpacks the four high-order bytes in an MMX register 
OF 68 /r and ano ^ er MMX register or 64-bit memory location 

and packs them into interleaved bytes in the 
destination MMX register. 


mmxl 


mmx2/mem64 



63 32 31 0 


punpckhbw-64.eps 


Related Instructions 

PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, 
PUNPCKLQDQ, PUNPCKLWD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKHDQ Unpack and Interleave High Doublewords 

Unpacks the high-order doublewords from the first and second source operands and packs them into 
interleaved-doubleword quadwords in the destination (first source). The low-order doublewords of the 
source operands are ignored. The first source/destination operand is an MMX register and the second 
source operand is another MMX register or 64-bit memory location. 

If the second source operand is all Os, the destination contains the doubleword(s) from the first source 
operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to 
unsigned 64-bit operands for subsequent processing that requires higher precision. 

The PUNPCKHDQ instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 



Opcode 

Description 

PUNPCKHDQ mmxl, 
mmx2lmem64 


OF 6A /r 

Unpacks the high-order doubleword in an MMX register 
and another MMX register or 64-bit memory location 
and packs them into interleaved doublewords in the 
destination MMX register. 



mmxl 


mmx2/mem64 


63 

32 31 

0 

63 32 31 0 



Related Instructions 

PUNPCKHBW, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, 
PUNPCKLQDQ, PUNPCKLWD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKHWD Unpack and Interleave High Words 

Unpacks the high-order words from the first and second source operands and packs them into 
interleaved-word doublewords in the destination (first source). The low-order words of the source 
operands are ignored. The first source/destination operand is an MMX register and the second source 
operand is another MMX register or 64-bit memory location. 

If the second source operand is all Os, the destination contains the words from the first source operand 
zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32- 
bit operands for subsequent processing that requires higher precision. 

The PUNPCKHWD instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


PUNPCKHWD mmxl, 
mmx2lmem64 


Opcode Description 

Unpacks two high-order words in an MMX register 
OF 69 /r and an °ther MMX register or 64-bit memory 

location and packs them into interleaved words in 
the destination MMX register. 


mmxl mmx2/mem64 



Related Instructions 

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKLBW, PUNPCKLDQ, PUNPCKLQDQ, 
PUNPCKLWD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKLBW Unpack and Interleave Low Bytes 

Unpacks the low-order bytes from the first and second source operands and packs them into 
interleaved-byte words in the destination (first source). The high-order bytes of the source operands 
are ignored. The first source/destination operand is an MMX register and the second source operand is 
another MMX register or 32-bit memory location. 

If the second source operand is all Os, the destination contains the bytes from the first source operand 
zero-extended to 16 bits. This operation is useful for expanding unsigned 8-bit values to unsigned 16- 
bit operands for subsequent processing that requires higher precision. 

The PUNPCKLBW instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic Opcode 

PUNPCKLBW mmxl, mmx2/mem32 OF 60 /r 


Description 

Unpacks the four low-order bytes in an MMX 
register and another MMX register or 32-bit 
memory location and packs them into interleaved 
bytes in the destination MMX register. 


mmxl mmx2/mem64 



Related Instructions 

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLDQ, 
PUNPCKLQDQ, PUNPCKLWD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKLDQ Unpack and Interleave Low Doublewords 

Unpacks the low-order doublewords from the first and second source operands and packs them into 
interleaved-doubleword quadwords in the destination (first source). The high-order doublewords of 
the source operands are ignored. The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 32-bit memory location. 

If the second source operand is all Os, the destination contains the doubleword(s) from the first source 
operand zero-extended to 64 bits. This operation is useful for expanding unsigned 32-bit values to 
unsigned 64-bit operands for subsequent processing that requires higher precision. 

The PUNPCKLDQ instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic 


PUNPCKLDQ mmxl, 
mmx2/mem32 


Opcode Description 

Unpacks the low-order doubleword in an MMX register 
OF 62 /r and an °ther MMX register or 32-bit memory location 

and packs them into interleaved doublewords in the 
destination MMX register. 



Related Instructions 

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, 
PUNPCKLQDQ, PUNPCKLWD 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PUNPCKLWD Unpack and Interleave Low Words 

Unpacks the low-order words from the first and second source operands and packs them into 
interleaved-word doublewords in the destination (first source). The high-order words of the source 
operands are ignored. The first source/destination operand is an MMX register and the second source 
operand is another MMX register or 32-bit memory location. 

If the second source operand is all Os, the destination contains the words from the first source operand 
zero-extended to 32 bits. This operation is useful for expanding unsigned 16-bit values to unsigned 32- 
bit operands for subsequent processing that requires higher precision. 

The PUNPCKLWD instruction is an MMX™ instruction. The presence of this instruction set is 
indicated by CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID 
instruction. 


Mnemonic Opcode 

PUNPCKLWD mmxl, mmx2lmem32 OF 61 /r 


Description 

Unpacks the two low-order words in an MMX 
register and another MMX register or 32-bit memory 
location and packs them into interleaved words in 
the destination MMX register. 



Related Instructions 

PUNPCKHBW, PUNPCKHDQ, PUNPCKHQDQ, PUNPCKHWD, PUNPCKLBW, PUNPCKLDQ, 
PUNPCKLQDQ 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, #MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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PXOR Packed Logical Bitwise Exclusive OR 

Performs a bitwise exclusive OR of the values in the first and second source operands and writes the 
result in the destination (first source). The first source/destination operand is an MMX register and the 
second source operand is another MMX register or 64-bit memory location. 

The PXOR instruction is an MMX™ instruction. The presence of this instruction set is indicated by 
CPUID feature bits. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode 


PXOR mmxl, mmx2/mem64 OF EF /r 


Description 

Performs bitwise logical XOR of values in an MMX register 
and in another MMX register or 64-bit memory location 
and writes the result in the destination MMX register. 


mmxl 


63 + 


XOR 


mmx2/mem64 


63 


0 


pxor-64.eps 


Related Instructions 

PAND, PANDN, POR 

rFLAGS Affected 

None 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

X 

X 

X 

The MMX™ instructions are not supported, as 
indicated by EDX[MMX] = 0, returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit 
or was non-canonical. 

General protection, 

#GP 

X 

X 

X 

A memory address exceeded a data segment limit or 
was non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

x87 floating-point 
exception pending, 

#MF 

X 

X 

X 

An unmasked x87 floating-point exception was 
pending. 

Alignment check, #AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 


216 


PXOR 


64-Bit Media 
Instruction Reference 






AMDS 

26569 — Rev. 3.15—May 2018 AMD64 Technology 

2 x87 Floating-Point Instruction Reference 


This chapter describes the function, mnemonic syntax, opcodes, condition codes, affected flags, and 
possible exceptions generated by the x87 floating-point instructions. The x87 floating-point 
instructions are used in legacy floating-point applications. Most of these instructions load, store, or 
operate on data located in the x87 ST(0)-ST(7) stack registers (the FPR0-FPR7 physical registers). 
The remaining instructions within this category are used to manage the x87 floating-point 
enviromnent. 

The AMD64 architecture requires support of the x87 floating-point instruction subset including the 
floating-point conditional moves and the FCOMI(P) and FUCOMI(P) instructions. On compliant 
processor implementations both the FPU and the CMOV feature flags are set. These are indicated by 
EDX[FPU] (bit 0) and EDX[CMOV] (bit 15) respectively returned by CPUID FnOOOOOOOl or 
CPUID Fn8000_0001. 

This is augmented by instructions that are members of the MMX, 3DNow!™, SSE3, and FXSR 
subsets. Support for the following instructions is implemenation-specific: 

• EMMS, which is an MMX instruction. Support for this instruction is indicated by 

CPUID FnOOOOOOO 1_EDX[MMX] = 1 or CPUID Fn8000_0001_EDX[MMX] = 1. 

• FEMMS, which is a 3DNow!™ instruction. Support for this instruction is indicated by 
CPUID Fn8000_0001_EDX[3DNow] = 1. 

• FISTTP, which is an SSE3 instruction. Support for this instruction is indicated by 

CPUID FnOOOO OOO 1_ECX[SSE3] = 1. 

• FXSAVE / FXRSTOR. Support for these instructions is indicated by 

CPUID Fn8000_0001_EDX[FXSR] = 1 or CPUID FnOOOO OOO 1_EDX[FXSR] = 1 

EMMS and FEMMS are described in Chapter 1, “64-Bit Media Instruction Reference”, on page 1. 

The x87 instructions can be used in legacy mode or long mode. Their use in long mode is available if 
the following feature bit is set: 

• Long Mode, as indicated by CPUID Fn8000_0001_EDX[LM] = 1. 

Compilation of x87 media programs for execution in 64-bit mode offers two primary advantages: 
access to the 64-bit virtual address space and access to the RIP-relative addressing mode. 

For further information about the x87 floating-point instructions and register resources, see: 

• “x87 Floating-Point Programming” in Volume 1. 

• “SSE, MMX, and x87 Programming” in Volume 2. 

• “Summary of Registers and Data Types” in Volume 3. 

• “Notation” in Volume 3. 

• “Instruction Prefixes” in Volume 3. 

For information on using the CPUID instruction, see the instruction description in Volume 3. 
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F2XM1 Floating-Point Compute 2 X -1 

Raises 2 to the power specified by the value in ST(0), subtracts 1, and stores the result in ST(0). The 
source value must be in the range -1.0 to +1.0. The result is undefined for source values outside this 
range. 

This instruction, when used in conjunction with the FYL2X instruction, can be applied to calculate 
z = x y by taking advantage of the log property x y = 2 y * log 2 x . 

Mnemonic Opcode Description 

F2XM1 D9 F0 Replace ST(0) with (2 ST (°> -1). 

Related Instructions 

FYL2X, FYL2XP1 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) were set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FABS Floating-Point Absolute Value 

Converts the value in ST(0) to its absolute value by clearing the sign bit. The resulting value depends 
upon the type of number used as the source value: 


Source Value (ST(0)) 

Result (ST(0)) 

-oo 

+oo 

-FiniteReal 

+FiniteReal 

-0 

+0 

+0 

+0 

+FiniteReal 

+FiniteReal 

+ oo 

+ OO 

NaN 

NaN 


This operation applies even if the value in ST(0) is negative zero or negative infinity. 


Mnemonic Opcode Description 

FABS D9 El Replace ST(0) with its absolute value. 

Related Instructions 

FPREM, FRNDINT, FXTRACT, FCHS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 
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FADD Floating-Point Add 

FADDP 

FIADD 

Adds two values and stores the result in a floating-point register. If two operands are specified, the 
values are in ST(0) and another floating-point register and the instruction stores the result in the first 
register specified. If one operand is specified, the instruction adds the 32-bit or 64-bit value in the 
specified memory location to the value in ST(0). 

The FADDP instruction adds the value in ST(0) to the value in another floating-point register and pops 
the register stack. If two operands are specified, the first operand is the other register. If no operand is 
specified, then the other register is ST(1). 

The FIADD instruction reads a 16-bit or 32-bit signed integer value from the specified memory 
location, converts it to double-extended-real format, and adds it to the value in ST(0). 


Mnemonic 

Opcode 

Description 

FADD ST(0),ST(/) 

D8 C0+/ 

Replace ST(0) with ST(0) + ST(/'). 

FADD ST(/),ST(0) 

DC C0+/ 

Replace ST(/) with ST(0) + ST(/). 

FADD mem32real 

D8/0 

Replace ST(0) with ST(0) + mem32real. 

FADD mem64real 

DC/0 

Replace ST(0) with ST(0) + mem64real. 

FADDP 

DE Cl 

Replace ST(1) with ST(0) + ST(1), and pop the x87 register stack 

FADDP ST(/),ST(0) 

DE C0+/ 

Replace ST(/) with ST(0) + ST(/), and pop the x87 register stack. 

FIADD mem16int 

DE 10 

Replace ST(0) with ST(0) + mem16int. 

FIADD mem32int 

DA 10 

Replace ST(0) with ST(0) + mem32int. 


Related Instructions 

None 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

-•-infinity was added to -infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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FBLD Floating-Point Load Binary-Coded Decimal 

Converts a 10-byte packed BCD value in memory into double-extended-precision format, and pushes 
the result onto the x87 stack. In the process, it preserves the sign of the source value. 

The packed BCD digits should be in the range 0 to 9. Attempting to load invalid digits (Ah through Fh) 
produces undefined results. 

Mnemonic Opcode 

FBLD mem80dec DF 14 

Related Instructions 

FBSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

0 

If no other flags are set. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Convert a packed BCD value to floating-point and push the 
result onto the x87 register stack. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FBSTP Floating-Point Store Binary-Coded Decimal and 

Pop 

Converts the value in ST(0) to an 18-digit packed BCD integer, stores the result in the specified 
memory location, and pops the register stack. It rounds a non-integral value to an integer value, 
depending on the rounding mode specified by the RC field of the x87 control word. 

The operand specifies the memory address of the first byte of the resulting 10-byte value. 


Mnemonic Opcode 

FBSTP mem80dec DF /6 

Related Instructions 

FBLD 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Convert the floating-point value in ST(0) to BCD, store the result in 
mem80, and pop the x87 register stack. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN value, 
iinfinity or an unsupported format. 

X 

X 

X 

A source operand was too large to fit in the destination 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FCHS Floating-Point Change Sign 

Compliments the sign bit of ST(0), changing the value from negative to positive or vice versa. This 
operation applies to positive and negative floating point values, as well as -0 and +0, NaNs, and +<» 
and —°o. 

Mnemonic Opcode 

FCHS D9E0 

Related Instructions 

FABS, FPREM, FRNDINT, FXTRACT 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Reverse the sign bit of ST(0). 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 
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FCLEX Floating-Point 

(FNCLEX) 

Clears the following flags in the x87 status word: 

• Floating-point exception flags (PE, UE, OE, ZE, DE, and IE) 

• Stack fault flag (SF) 

• Exception summary status flag (ES) 

• Busy flag (B) 

It leaves the four condition-code bits undefined. It does not check for possible floating-point 
exceptions before clearing the flags. 

Assemblers usually provide an FCLEX macro that expands into the instruction sequence 

WAIT ; Opcode 9B 

FNCLEX destination ; Opcode DB E2 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler, if 
necessary. The FNCLEX instruction then clears all the relevant x87 exception flags. 

Description 

Perform a WAIT (9B) to check for pending floating-point 
exceptions, and then clear the floating-point exception flags. 

Clear the floating-point flags without checking for pending 
unmasked floating-point exceptions. 

Related Instructions 

WAIT 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Mnemonic Opcode 

FCLEX 9B DB E2 

FNCLEX DB E2 


Clear Flags 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 
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FCMOVcc Floating-Point Conditional Move 

Tests the flags in the rFLAGS register and, depending upon the values encountered, moves the value in 
another stack register to ST(0). 

This set of instructions includes the mnemonics FCMOVB, FCMOVBE, FCMOVE, FCMOVNB, 
FCMOVNBE, FCMOVNE, FCMOVNU, and FCMOVU. 

Support for the FCMOVcc instruction is indicated when both EDX[FPU] (bit 0) and EDX[CMOV] 
(bit 15) are set, as returned by either CPUID function OOOOOOOlh or function 8000_0001h. 


Mnemonic 

Opcode 

Description 


FCMOVB ST(0),ST(/) 

DA C0+/ 

Move the contents of ST(/) into ST(0) if below (CF = 1). 

FCMOVBE ST(0),ST(/) 

DA D0+/ 

Move the contents of ST(/) into ST(0) if below or equal (CF = 
1 or ZF = 1). 

FCMOVE ST(0),ST(/) 

DA C8+/ 

Move the contents of ST(/) 

into ST(0) if equal (ZF = 1). 

FCMOVNB ST(0),ST(/') 

DB C0+/ 

Move the contents of ST(/) 

into ST(0) if not below (CF = 0). 

FCMOVNBE ST(0),ST(/) 

DB D0+/ 

Move the contents of ST(/) 
(CF = 0 and ZF = 0). 

into ST(0) if not below or equal 

FCMOVNE ST(0),ST(/) 

DB C8+/ 

Move the contents of ST(/) 

into ST(0) if not equal (ZF = 0). 

FCMOVNU ST(0),ST(/') 

DB D8+/ 

Move the contents of ST(/') 
0). 

into ST(0) if not unordered (PF = 

FCMOVU ST(0),ST(/) 

DA D8+/ 

Move the contents of ST(/) 

into ST(0) if unordered (PF = 1). 

Related Instructions 





None 


rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protecte 

d 

Cause of Exception 

Invalid opcode, 

#UD 

X 

X 

X 

The Conditional Move instructions are not supported, as 
indicated by EDX[FPU] and EDX[CMOV] returned by CPUID 
Fn0000_0001 or Fn8000_0001. 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the control 
register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 
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FCOM Floating-Point Compare 

FCOMP 

FCOMPP 

Compares the specified value to the value in ST(0) and sets the CO, C2, and C3 condition code flags in 
the x87 status word as shown in the x87 Condition Code table below. The specified value can be in a 
floating-point register or a memory location. 

The no-operand version compares the value in ST( 1) with the value in ST(0). 

The comparison operation ignores the sign of zero (-0.0 = +0.0). 

After performing the comparison operation, the FCOMP instruction pops the x87 register stack and 
the FCOMPP instruction pops the x87 register stack twice. 

If either or both of the compared values is a NaN or is in an unsupported format, the FCOMx 
instruction sets the invalid-operation exception (IE) bit in the x87 status word to 1, and sets the 
condition flags to 'unordered.’ 

The FUCOMx instructions perform the same operations as the FCOMx instructions, but do not set the 
IE bit for QNaNs. 


Mnemonic 

Opcode 

Description 

FCOM 

D8 D1 

Compare the contents of ST(0) to the contents of ST(1) and 
set condition flags to reflect the results of the comparison. 

FCOM ST(/) 

D8 D0+/ 

Compare the contents of ST(0) to the contents of ST(/) and 
set condition flags to reflect the results of the comparison. 

FCOM mem32real 

D8/2 

Compare the contents of ST(0) to the contents of 
mem32real and set condition flags to reflect the results of 
the comparison. 

FCOM mem64real 

DC/2 

Compare the contents of ST(0) to the contents of 
mem64real and set condition flags to reflect the results of 
the comparison. 

FCOMP 

D8 D9 

Compare the contents of ST(0) to the contents of ST(1), set 
condition flags to reflect the results of the comparison, and 
pop the x87 register stack. 

FCOMP ST(/) 

D8 D8+/ 

Compare the contents of ST(0) to the contents of ST(/), set 
condition flags to reflect the results of the comparison, and 
pop the x87 register stack. 

FCOMP mem32real 

D8/3 

Compare the contents of ST(0) to the contents of 
mem32real, set condition flags to reflect the results of the 
comparison, and pop the x87 register stack. 
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FCOMP mem64real DC 13 

FCOMPP DE D9 


Compare the contents of ST(0) to the contents of 
mem64real, set condition flags to reflect the results of the 
comparison, and pop the x87 register stack. 

Compare the contents of ST(0) to the contents of ST(1), set 
condition flags to reflect the results of the comparison, and 
pop the x87 register stack twice. 


Related Instructions 

FCOMI, FCOMIP, FICOM, FICOMP, FTST, FUCOMI, FUCOMIP, FXAM 

rFLAGS Affected 

None 

x87 Condition Code 


C3 

C2 

Cl 

CO 

Compare Result 

0 

0 

0 

0 

ST(0) > source 

0 

0 

0 

1 

ST(0) < source 

1 

0 

0 

0 

ST(0) = source 

1 

1 

0 

1 

Operands were unordered 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN value, or 
an unsupported format. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FCOMI Floating-Point Compare and Set Flags 

FCOMIP 

Compares the value in ST(0) with the value in another floating-point register and sets the zero flag 
(ZF), parity flag (PF), and carry flag (CF) in the rFLAGS register based on the result as shown in the 
table in the x87 Condition Code section. 

The comparison operation ignores the sign of zero (-0.0 = +0.0). 

After performing the comparison operation, FCOMIP pops the x87 register stack. 

If either or both of the compared values is a NaN or is in an unsupported format, the FCOMIx 
instruction sets the invalid-operation exception (IE) bit in the x87 status word to 1 and sets the flags to 
“unordered.” 

The FUCOMIx instructions perform the same operations as the FCOMIx instructions, but do not set 
the IE bit for QNaNs. 

Support for the FCOMIx instruction can be determined by executing either CPUID FnOOOOOOOl or 
CPUID Fn8000_0001. Support is indicated when both EDX[FPU] (bit 0) and EDX[CMOV] (bit 15) 
are set. 


Mnemonic 

Opcode 

Description 

FCOMI ST(0),ST(/') 

DB F0+/ 

Compare the contents of ST(0) with the contents of ST(/) 
and set status flags to reflect the results of the comparison. 

FCOMIP ST(0),ST(/') 

DF F0+/ 

Compare the contents of ST(0) with the contents of ST(/'), 
set status flags to reflect the results of the comparison, and 


pop the x87 register stack. 


Related Instructions 

FCOM, FCOMPP, FICOM, FICOMP, FTST, FUCOMI, FUCOMIP, FXAM 

rFLAGS Affected 


ZF 

PF 

CF 

Compare Result 

0 

0 

0 

ST(0) > source 

0 

0 

1 

ST(0) < source 

1 

0 

0 

ST(0) = source 

1 

1 

1 

Operands were unordered 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 



Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

C2 



C3 



Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The conditional move instructions are not supported, as 
indicated by EDX[FPU] and EDX[CMOV] returned by 

CPUID Fn0000_0001 or Fn8000_0001. 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN value, or 
an unsupported format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FCOS Floating-Point Cosine 

Computes the cosine of the radian value in ST(0) and stores the result in ST(0). 

If the radian value lies outside the valid range of-2 63 to +2 63 radians, the instruction sets the C2 flag in 
the x87 status word to 1 to indicate the value is out of range and does not change the value in ST(0). 

Mnemonic Opcode Description 

FCOS D9 FF Replace ST(0) with the cosine of ST(0). 

Related Instructions 

FPTAN, FPATAN, FSIN, FSINCOS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

0 

Source operand was in range. 

1 

Source operand was out of range. 

C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FDECSTP Floating-Point Decrement Stack-Top Pointer 

Decrements the top-of-stack pointer (TOP) field of the x87 status word. If the TOP field contains 0, it 
is set to 7. In other words, this instruction rotates the stack by one position. 

Mnemonic Opcode Description 

FDECSTP D9 F6 Decrement the TOP field in the x87 status word. 


Data Register 

Before FDECSTP 


After FDECSTP 

Value 

Stack Pointer 

Stack Pointer 

Value 

7 

numl 

ST (7) 


ST(0) 

numl 

6 

num2 

ST(6) 

ST(7) 

num2 

5 

num3 

ST(5) 

ST(6) 

num3 

4 

num4 

ST(4) 

ST(5) 

num4 

3 

num5 

ST(3) 

ST(4) 

num5 

2 

num6 

ST(2) 

ST(3) 

num6 

1 

num7 

ST (1) 

ST(2) 

num7 

0 

num8 

ST(0) 

ST(1) 

num8 


Related Instructions 

FINCSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FDIV Floating-Point Divide 

FDIVP 

FIDIV 

Divides the value in a floating-point register by the value in another register or a memory location and 
stores the result in the register containing the dividend. For the FDIV and FDIVP instructions, the 
divisor value in memory can be stored in single-precision or double-precision floating-point format. 

If only one operand is specified, the instruction divides the value in ST(0) by the value in the specified 
memory location. 

If no operands are specified, the FDIVP instruction divides the value in ST(1) by the value in ST(0), 
stores the result in ST(1), and pops the x87 register stack. 

The FIDIV instruction converts a divisor in word integer or short integer format to double-extended- 
precision floating-point format before performing the division. It treats an integer 0 as +0. 

If the zero-divide exception is not masked (ZM bit cleared to 0 in the x87 control word) and the 
operation causes a zero-divide exception (sets the ZE bit in the x87 status word to 1), the operation 
stores no result. If the zero-divide exception is masked (ZM bit set to 1), a zero-divide exception 
causes ±°° to be stored. 

The sign of the operands, even if one of the operands is 0, determines the sign of the result. 


Mnemonic 

Opcode 

Description 

FDIV ST(0),ST(/) 

D8 F0+/ 

Replace ST(0) with ST(0)/ST(/). 

FDIV ST(/),ST(0) 

DC F8+/ 

Replace ST(/) with ST(/)/ST(0). 

FDIV mem32real 

D8/6 

Replace ST(0) with ST(0 )/mem32real. 

FDIV mem64real 

DC/6 

Replace ST(0) with ST(0 )lmem64real. 

FDIVP 

DE F9 

Replace ST(1) with ST(1)/ST(0), and pop the x87 register 
stack. 

FDIVP ST(/),ST(0) 

DE F8+/ 

Replace ST(/) with ST(/')/ST(0), and pop the x87 register 
stack. 

FIDIV mem16int 

DE 16 

Replace ST(0) with ST(0 )/mem16int. 

FIDIV mem32int 

DA 16 

Replace ST(0) with ST(0 )/mem32int. 


Related Instructions 

FDIVR, FDIVRP, FIDIVR 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

±infinity was divided by ±infinity. 

X 

X 

X 

±zero was divided by ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Zero-divide 
exception (ZE) 

X 

X 

X 

A non-zero value was divided by ±zero. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FDIVR Floating-Point Divide Reverse 

FDIVRP 

FIDIVR 

Divides a value in a floating-point register or a memory location by the value in a floating-point 
register and stores the result in the register containing the divisor. For the FDIVR and FDIVRP 
instructions, a dividend value in memory can be stored in single-precision or double-precision 
floating-point fonnat. 

If one operand is specified, the instruction divides the value at the specified memory location by the 
value in ST(0). If two operands are specified, it divides the value in ST(0) by the value in another x87 
stack register or vice versa. 

The FIDIVR instruction converts a dividend in word integer or short integer format to double- 
extended-precision format before performing the division. 

The FDIVRP instruction pops the x87 register stack after performing the division operation. If no 
operand is specified, the FDIVRP instruction divides the value in ST(0) by the value in ST( 1). 

If the zero-divide exception is not masked (ZM bit cleared to 0 in the x87 control word) and the 
operation causes a zero-divide exception (sets the ZE bit in the x87 status word to 1), the operation 
stores no result. If the zero-divide exception is masked (ZM bit set to 1), a zero-divide exception 
causes ±°° to be stored. 

The sign of the operands, even if one of the operands is 0, determines the sign of the result. 


Mnemonic 

Opcode 

Description 

FDIVR ST(0),ST(/') 

D8 F8+/ 

Replace ST(0) with ST(/)/ST(0). 

FDIVR ST(/), ST(0) 

DC F0+/ 

Replace ST(/) with ST(0)/ST(/). 

FDIVR mem32real 

D8/7 

Replace ST(0) with mem32real/ ST(0). 

FDIVR mem64real 

DC/7 

Replace ST(0) with mem64real/ ST(0). 

FDIVRP 

DE FI 

Replace ST(1) with ST(0)/ST(1), and pop the x87 register 
stack. 

FDIVRP ST(/), ST(0) 

DE F0 +/ 

Replace ST(/) with ST(0)/ST(/'), and pop the x87 register 
stack. 

FIDIVR mem16int 

DE/7 

Replace ST(0) with mem16int/ ST(0). 

FIDIVR mem32int 

DA 17 

Replace ST(0) with mem32int/ ST(0). 


Related Instructions 

FDIV, FDIVP, FIDIV 
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rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or is 
non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or is 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

iinfinity was divided by ±infinity. 

X 

X 

X 

±zero was divided by ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Zero-divide 
exception (ZE) 

X 

X 

X 

A non-zero value was divided by ±zero. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FFREE Floating-Point Free Register 

Frees the specified x87 stack register by marking its tag register entry as empty. The instruction does 
not affect the contents of the freed register or the top-of-stack pointer (TOP). 


Mnemonic Opcode 

FFREE ST(/) DD C0+/ 

Related Instructions 

FLD, FST, FSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Description 

Set the tag for x87 stack register /' to empty (11b). 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FICOM Floating-Point Integer Compare 

FICOMP 

Converts a 16-bit or 32-bit signed integer value to double-extended-precision format, compares it to 
the value in ST(0), and sets the CO, C2, and C3 condition code flags in the x87 status word to reflect 
the results. 

The comparison operation ignores the sign of zero (-0.0 = +0.0). 

After performing the comparison operation, the FICOMP instruction pops the x87 register stack. 

If ST(0) is a NaN or is in an unsupported format, the instruction sets the condition flags to 
“unordered.” 


Mnemonic Opcode 

FICOM mem16int DE 12 

FICOM mem32int DA/2 

FICOMP mem16int DE 13 

FICOMP mem32int DA/3 


Description 

Convert the contents of mem16int to double-extended- 
precision format, compare the result to the contents of 
ST(0), and set condition flags to reflect the results of the 
comparison. 

Convert the contents of mem32int to double-extended- 
precision format, compare the result to the contents of 
ST(0), and set condition flags to reflect the results of the 
comparison. 

Convert the contents of mem16int to double-extended- 
precision format, compare the result to the contents of 
ST(0), set condition flags to reflect the results of the 
comparison, and pop the x87 register stack. 

Convert the contents of mem32int to double-extended- 
precision format, compare the result to the contents of 
ST(0), set condition flags to reflect the results of the 
comparison, and pop the x87 register stack. 


Related Instructions 

FCOM, FCOMPP, FCOMI, FCOMIP, FTST, FUCOMI, FUCOMIP, FXAM 


rFLAGS Affected 

None 
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x87 Condition Code 


C3 

C2 

Cl 

CO 

Compare Result 

0 

0 

0 

0 

ST(0) > source 

0 

0 

0 

1 

ST(0) < source 

1 

0 

0 

0 

ST(0) = source 

1 

1 

0 

1 

Operands were unordered 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value, a QNaN value, or 
an unsupported format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FILD Floating-Point Load Integer 

Converts a signed-integer in memory to double-extended-precision fonnat and pushes the value onto 
the x87 register stack. The value can be a 16-bit, 32-bit, or 64- bit integer value. Signed values from 
memory can always be represented exactly in x87 registers without rounding. 


Mnemonic 

Opcode 

FILD mem16int 

DF 10 

FILD mem32int 

DB 10 

FILD mem64int 

DF 15 


Description 

Push the contents of mem16int onto the 
Push the contents of mem32int onto the 
Push the contents of mem64int onto the 


x87 

x87 

x87 


register stack, 
register stack, 
register stack. 


Related Instructions 

FLD, FST, FSTP, FIST, FISTP, FBLD, FBSTP 

rFLAGS Affected 


None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No stack overflow. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FINCSTP Floating-Point Increment Stack-Top Pointer 

Increments the top-of-stack pointer (TOP) field of the x87 status word. If the TOP field contains 7, it is 
cleared to 0. In other words, this instruction rotates the stack by one position. 

Mnemonic Opcode Description 

FINCSTP D9 F7 Increment the TOP field in the x87 status word. 


Data Register 

Before FINCSTP 


After FINCSTP 

Value 

Stack Pointer 

Stack Pointer 

Value 

7 

numl 

ST(7) 


ST(6) 

numl 

6 

num2 

ST(6) 

ST(5) 

num2 

5 

num3 

ST(5) 

ST(4) 

num3 

4 

num4 

ST(4) 

ST(3) 

num4 

3 

num5 

ST(3) 

ST(2) 

num5 

2 

num6 

ST(2) 

ST(1) 

num6 

1 

num7 

ST(1) 

ST(0) 

num7 

0 

num8 

ST(0) 

ST(7) 

num8 


Related Instructions 

FDECSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FINIT Floating-Point Initialize 

FNINIT 

Sets the x87 control word register, status word register, tag word register, instruction pointer, and data 
pointer to their default states as follows: 

• Sets the x87 control word to 037Fh—round to nearest (RC = 00b); double-extended-precision (PC 
= lib); all exceptions masked (PM, UM, OM, ZM, DM, and IM all set to 1). 

• Clears all bits in the x87 status word (TOP is set to 0, which maps ST(0) onto FPRO). 

• Marks all x87 stack registers as empty (1 lb) in the x87 tag register. 

• Clears the instruction pointer and the data pointer. 

These instructions do not actually zero out the x87 stack registers. 

Assemblers usually provide an FINIT macro that expands into the instruction sequence 

WAIT ; Opcode 9B 

FNINIT destination ; Opcode DB E3 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler, if 
necessary. The FNINIT instruction then resets the x87 environment to its default state. 


Mnemonic 

Opcode 

FINIT 

9B DB E3 

FNINIT 

DB E3 


Related Instructions 

FWAIT, WAIT 

rFLAGS Affected 


Description 

Perform a WAIT (9B) to check for pending floating-point 
exceptions and then initialize the x87 unit. 

Initialize the x87 unit without checking for unmasked 
floating-point exceptions. 


None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

0 


Cl 

0 


C2 

0 


C3 

0 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 
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FIST Floating-Point Integer Store 

FISTP 

Converts the value in ST(0) to a signed integer, rounds it if necessary, and copies it to the specified 
memory location. The rounding control (RC) field of the x87 control word determines the type of 
rounding used. 

The FIST instruction supports 16-bit and 32-bit values. The FISTP instructions supports 16-bit, 32-bit, 
and 64-bit values. 

The FISTP instruction pops the stack after storing the rounded value in memory. 

If the value is too large for the destination location, is a NaN, or is in an unsupported format, the 
instruction sets the invalid-operation exception (IE) bit in the x87 status word to 1. Then, if the 
exception is masked (IM bit set to 1 in the x87 control word), the instruction stores the integer 
indefinite value. If the exception is unmasked (IM bit cleared to 0), the instruction does not store the 
value. 


Mnemonic 

Opcode 

Description 

FIST mem16int 

DF/2 

Convert the contents of ST(0) to integer and store the result 
in mem16int. 

FIST mem32int 

DB/2 

Convert the contents of ST(0) to integer and store the result 
in mem32int. 

FISTP mem16int 

DF 13 

Convert the contents of ST(0) to integer, store the result in 
mem16int, and pop the x87 register stack. 

FISTP mem32int 

DB 13 

Convert the contents of ST(0) to integer, store the result in 
mem32int , and pop the x87 register stack. 

FISTP mem64int 

DF n 

Convert the contents of ST(0) to integer, store the result in 
mem64int, and pop the x87 register stack. 


Table 2-1 shows the results of storing various types of numbers as integers. 


Table 2-1. Storing Numbers as Integers 


ST(0) 

DEST 

-oo 

Invalid-operation (IE) exception 

-Finite-real < -1 

-Integer (Invalid-operation (IE) exception if the integer is too large for the 
destination) 

-1 < -Finite-real< -0 

0 or -1, depending on the rounding mode 

-0 

0 

+0 

0 

+0 < +Finite-real < +1 

0 or+1, depending on the rounding mode 


258 


FISTx 


x87 Floating-Point 
Instruction Reference 




26569—Rev. 3.15—May 2018 


AMDS 

AMD64 Technology 


Table 2-1. Storing Numbers as Integers (continued) 


ST(0) 

DEST 

+Finite-real > +1 

+lnteger (Invalid-operation (IE) exception if the integer is too large for the 
destination) 

+oo 

Invalid-operation (IE) exception 

NaN 

Invalid-operation (IE) exception 


Related Instructions 

FLD, FST, FSTP, FILD, FBLD, FBSTP, FISTTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

The source operand was too large for the destination 
format. 

X 

X 

X 

A source operand was an SNaN value, a QNaN value, 
-•--infinity, or an unsupported format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FISTTP Floating Point Integer Truncate and Store 

Converts a floating-point value in ST(0) to an integer by truncating the fractional part of the number 
and storing the integer result to the memory address specified by the destination operand. FISTTP then 
pops the floating point register stack. The FISTTP instruction ignores the rounding mode specified by 
the x87 control word. 

The FISTTP instruction applies to 16-bit, 32-bit, and 64-bit operands. 

The FISTTP instruction is an SSE3 instruction. Support for this instruction subset is indicated by 
CPUID Fn0000_0001_ECX[SSE3] = 1. See “CPUID” in Volume 3 for more information about the 
CPUID instruction. 


Mnemonic 

Opcode 

Description 

FISTTP mem16int 

DF/1 

Store the truncated floating-point value in ST(0) in 
memory location mem16int and pop the floating-point 
register stack. 

FISTTP mem32int 

DB/1 

Store the truncated floating-point value in ST(0) in 
memory location mem32int and pop the floating-point 
register stack. 

FISTTP mem64int 

DD/1 

Store the truncated floating-point value in ST(0) in 
memory location mem64int and pop the floating-point 
register stack. 


Table 2-2 shows the results of storing various types of numbers as integers. 


Table 2-2. Storing Numbers as Integers 


ST(0) 

DESTINATION 

-oo 

Invalid-operation (IE) exception 

-Finite-real < -1 

-Integer (Invalid-operation (IE) exception if the integer is too large for the destination) 

-1 < Finite-real < +1 

0 

-•-Finite-real > +1 

+lnteger (Invalid-operation (IE) exception if the integer is too large for the destination) 

+oo 

Invalid-operation (IE) exception 

NaN 

Invalid-operation (IE) exception 


Related Instructions 

FLD, FST, FSTP, FILD, FBLD, FBSTP, FISTP 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value* 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

FP number is rounded down (always done since the instruction forces 
truncate mode). 

C2 

u 


C3 

u 


Note: *A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

#UD 

X 

X 

X 

The SSE3 instructions are not supported, as indicated by 
CPUID Fn0000_0001_ECX[SSE3] = 0. 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

The source operand was too large for the destination 
format. 

X 

X 

X 

A source operand was an SNaN value, a QNaN valued- 
infinity, or an unsupported format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FLD Floating-Point Load 

Pushes a value in memory or in a floating-point register onto the register stack. If in memory, the value 
can be a single-precision, double-precision, or double-extended-precision floating-point value. The 
operation converts a single-precision or double-precision value to double-extended-precision format 
before pushing it onto the stack. 


Mnemonic 

Opcode 

Description 

FLD ST(/) 

D9 C0+/ 

Push the contents of ST (/) onto the x87 register stack. 

FLD mem32real 

D9/0 

Push the contents of mem32real onto the x87 register stack 

FLD mem64real 

DD 10 

Push the contents of mem64real onto the x87 register stack 

FLD mem80real 

DB 15 

Push the contents of mem80real onto the x87 register stack 


Related Instructions 

FFREE, FST, FSTP, FILD, FIST, FISTP, FBLD, FBSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

0 

No x87 stack fault. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value. This exception does 
not occur if the source operand was in double-extended- 
precision format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

X 

X 

X 

An x87 stack overflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. This exception 
does not occur if the source operand was in double- 
extended-precision format. 
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FLD1 Floating-Point Load +1.0 

Pushes the floating-point value +1.0 onto the register stack. 


Mnemonic Opcode Description 

FLD1 D9 E8 Push +1.0 onto the x87 register stack. 

Related Instructions 

FLD, FLDZ, FLDPI, FLDL2T, FLDL2E, FLDLG2, FLDLN2 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDCW Floating-Point Load x87 Control Word 

Loads a 16-bit value from the specified memory location into the x87 control word. If the new x87 
control word unmasks any pending floating point exceptions, then they are handled upon execution of 
the next x87 floating-point or 64-bit media instruction. 

To avoid generating exceptions when loading a new control word, use the FCLEX or FNCLEX 
instruction to clear any pending exceptions. 


Mnemonic Opcode 

FLDCW mem2env D9 15 


Description 

Load the contents of mem2env into the x87 control word. 


Related Instructions 

FSTCW, FNSTCW, FSTSW, FNSTSW, FSTENV, FNSTENV, FLDENV, FCLEX, FNCLEX 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or was 
non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 


x87 Floating-Point 
Instruction Reference 


FLDCW 


267 






AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


FLDENV Floating-Point Load x87 Environment 

Restores the x87 environment from memory starting at the specified address. The x87 environment 
consists of the x87 control, status, and tag word registers, the last non-control x87 instruction pointer, 
the last x87 data pointer, and the opcode of the last completed non-control x87 instruction. 

The FLDENV instruction takes a memory operand that specifies the starting address of either a 14- 
byte or 28-byte area in memory. The 14-byte operand is required for a 16-bit operand-size; the 28-byte 
memory area is required for both 32-bit and 64-bit operand sizes. The layout of the saved x87 
environment within the specified memory area depends on whether the processor is operating in 
protected or real mode. See “Media and x87 Processor State” in Volume 2 for details on how this 
instruction loads the x87 environment from memory. (Because FSTENV does not save the full 64-bit 
data and instruction pointers, 64-bit applications should use FXSAVE/FXRSTOR, rather than 
FSTENV/FLDENV.) 

The environment to be loaded is typically stored by a previous FNSTENV or FSTENV instruction. 
The FLDENV instruction should be executed in the same operating mode as the instruction that stored 
the x87 environment. 

If FLDENV results in set exception flags in the loaded x87 status word register, and these exceptions 
are unmasked in the x87 control word register, a floating-point exception occurs when the next 
floating-point instruction is executed (except for the no-wait floating-point instructions). 

To avoid generating exceptions when loading a new environment, use the FCLEX or FNCLEX 
instruction to clear the exception flags in the x87 status word before storing that environment. 


Mnemonic Opcode Description 

FLDENV nq Load the complete contents of the x87 environment from 

mem14/28env ua/4 mem14/28env. 

Related Instructions 

FSTENV, FNSTENV, FCLEX, FNCLEX 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

M 

Loaded from memory. 

Cl 

M 

Loaded from memory. 

C2 

M 

Loaded from memory. 

C3 

M 

Loaded from memory. 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FLDL2E Floating-Point Load Log 2 e 

Pushes log 2 e onto the x87 register stack. The value in ST(0) is the result, in double-extended-precision 
format, of rounding an internal 66-bit constant according to the setting of the RC field in the x87 
control word register. 


Mnemonic Opcode Description 

FLDL2E D9 EA Push log 2 e onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDZ, FLDPI, FLDL2T, FLDLG2, FLDLN2 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDL2T Floating-Point Load Log 2 10 

Pushes log 2 10 onto the x87 register stack. The value in ST(0) is the result, in double-extended- 
precision format, of rounding an internal 66-bit constant according to the setting of the RC field in the 
x87 control word register. 

Mnemonic Opcode Description 

FLDL2T D9 E9 Push log 2 10 onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDZ, FLDPI, FLDL2E, FLDLG2, FLDLN2 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDLG2 Floating-Point Load Log 10 2 

Pushes log 10 2 onto the x87 register stack. The value in ST(0) is the result, in double-extended- 
precision format, of rounding an internal 66-bit constant according to the setting of the RC field in the 
x87 control word register. 

Mnemonic Opcode Description 

FLDLG2 D9 EC Push log 10 2 onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDZ, FLDPI, FLDL2T, FLDL2E, FLDLN2 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDLN2 Floating-Point Load Ln 2 

Pushes log e 2 onto the x87 register stack. The value in ST(0) is the result, in double-extended-precision 
format, of rounding an internal 66-bit constant according to the setting of the RC field in the x87 
control word register. 


Mnemonic Opcode Description 

FLDLN2 D9 ED Push log e 2 onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDZ, FLDPI, FLDL2T, FLDL2E, FLDLG2 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDPI Floating-Point Load Pi 

Pushes K onto the x87 register stack. The value in ST(0) is the result, in double-extended-precision 
format, of rounding an internal 66-bit constant according to the setting of the RC field in the x87 
control word register. 


Mnemonic Opcode Description 

FLDPI D9 EB Push n onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDZ, FLDL2T, FLDL2E, FLDLG2, FLDLN2 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FLDZ Floating-Point Load +0.0 

Pushes +0.0 onto the x87 register stack. 


Mnemonic Opcode Description 

FLDZ D9 EE Push zero onto the x87 register stack. 

Related Instructions 

FLD, FLD1, FLDPI, FLDL2T, FLDL2E, FLDLG2, FLDLN2 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No x87 stack fault occurred. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CR0) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack overflow occurred. 
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FMUL Floating-Point Multiply 

FMULP 

FIMUL 

Multiplies the value in a floating-point register by the value in a memory location or another stack 
register and stores the result in the first register. The instruction converts a single-precision or double¬ 
precision value in memory to double-extended-precision fonnat before multiplying. 

If one operand is specified, the instruction multiplies the value in the ST(0) register by the value in the 
specified memory location and stores the result in the ST(0) register. 

If two operands are specified, the instruction multiplies the value in the ST(0) register by the value in 
another specified floating-point register and stores the result in the register specified in the first 
operand. 

The FMULP instruction pops the x87 stack after storing the product. The no-operand version of the 
FMULP instruction multiplies the value in the ST(1) register by the value in the ST(0) register and 
stores the product in the ST(1) register. 

The FIMUL instruction converts a short-integer or word-integer value in memory to double-extended- 
precision format, multiplies it by the value in ST(0), and stores the product in ST(0). 


Mnemonic 

Opcode 

Description 

FMUL ST(0),ST(/) 

D8 C8+/ 

Replace ST(0) with ST(0) * ST(/). 

FMUL ST(/),ST(0) 

DC C8+/ 

Replace ST(/') with ST(0) * ST(/). 

FMUL mem32real 

D8/1 

Replace ST(0) with mem32real * ST(0). 

FMUL mem64real 

DC/I 

Replace ST(0) with mem64real * ST(0). 

FMULP 

DE C9 

Replace ST(1) with ST(0) * ST(1), and pop the x87 register 
stack. 

FMULP ST(/'),ST(0) 

DE C8+/ 

Replace ST(/) with ST(0) * ST(/'), and pop the x87 register 
stack. 

FIMUL mem16int 

DE /I 

Replace ST(0) with mem16int * ST(0). 

FIMUL mem32int 

DA/I 

Replace ST(0) with mem32int * ST(0). 


Related Instructions 

None 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or was 
non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

iinfinity was multiplied by ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FNOP Floating-Point No Operation 

Performs no operation. This instruction affects only the rIP register. It does not otherwise affect the 
processor context. 

Mnemonic Opcode Description 

FNOP D9 DO Perform no operation. 

Related Instructions 

FWAIT, NOP 

rFLAGS Affected 

None 

x87 Condition Code 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FPATAN Floating-Point Partial Arctangent 

Computes the arctangent of the ordinate (Y) in ST( 1) divided by the abscissa (X) in ST(0), which is the 
angle in radians between the X axis and the radius vector from the origin to the point (X, Y). It then 
stores the result in ST(1) and pops the x87 register stack. The resulting value has the same sign as the 
ordinate value and a magnitude less than or equal to 7t. 

There is no restriction on the range of values that FPATAN can accept. Table 2-3 shows the results 
obtained when computing the arctangent of various classes of numbers, assuming that underflow does 
not occur: 


Table 2-3. Computing Arctangent of Numbers 



X (ST(0)) 

—OO 

-Finite 

-0 

+0 

+Finite 

+oo 

NaN 

Y (ST(1)) 

—oo 

-371/4 

-71/2 

-71/2 

-71/2 

-71/2 

-71/4 

NaN 

-Finite 

-n 

-n to -71/2 

-71/2 

-71/2 

-71/2 to -0 

—0 

NaN 

-0 

-n 

-71 

-n 

-0 

-0 

—0 

NaN 

+0 

+n 

+n 

+n 

+0 

+0 

+0 

NaN 

+Finite 

+n 

+K to +71/2 

+71/2 

+71/2 

+71/2 to +0 

+0 

NaN 

+oo 

+3n/4 

+71/2 

+71/2 

+71/2 

+71/2 

+71/4 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


Mnemonic Opcode Description 

fpatam no f^ Compute arctan(ST(1)/ST(0)), store the result in ST(1), and 

rrAIAIN ua pop the x87 register stack. 


Related Instructions 

FCOS, FPTAN, FSIN, FSINCOS 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FPREM Floating-Point Partial Remainder 

Computes the exact remainder obtained by dividing the value in ST(0) by that in ST(1), and stores the 
result in ST(0). It computes the remainder by an iterative subtract-and-shift long division algorithm in 
which one quotient bit is calculated in each iteration. 

If the exponent difference between ST(0) and ST(1) is less than 64, the instruction computes all integer 
bits of the quotient, guaranteeing that the remainder is less in magnitude than the divisor in ST(1). If 
the exponent difference is equal to or greater than 64, it computes only the subset of integer quotient 
bits numbering between 32 and 63, returns a partial remainder, and sets the C2 condition code bit to 1. 

FPREM is supported for software that was written for early x87 coprocessors. Unlike the FPREM 1 
instruction, FPREM does not compute the partial remainder as specified in IEEE Standard 754. 


Mnemonic Opcode 


Description 


FPREM D9F8 


Compute the remainder of the division of ST(0) by ST(1) and 
store the result in ST(0). 


Action 

ExpDiff = Exponent(ST(0)) - Exponent(ST(1)) 

IF (ExpDiff < 0) 

{ 

SW.C2 = 0 

{SW.CO, SW.C3, SW.Cl} = 0 

} 

ELSIF (ExpDiff < 64) 

{ 

Quotient = Truncate(ST(0)/ST(1)) 

ST(0) = ST(0) - (ST(1) * Quotient) 

SW.C2 = 0 

(SW.CO, SW.C3, SW.Cl} = Quotient mod 8 

} 

ELSE 

{ 

N = 32 + (ExpDiff mod 32) 

Quotient = Truncate ((ST(0)/ST(1))/2^(ExpDiff-N)) 
ST(0) = ST(0) - (ST(1) * Quotient * 2^(ExpDiff-N)) 

SW.C2 = 1 

(SW.CO, SW.C3, SW.Cl} = 0 

} 

Related Instructions 

FPREM 1, FABS, FRNDINT, FXTRACT, FCHS 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

M 

Set equal to the value of bit 2 of the quotient. 

Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

M 

Set equal to the value of bit 0 of the quotient, if there was no fault. 

C2 

0 

FPREM generated the partial remainder. 

1 

The source operands differed by more than a factor of 2 64 , so the result 
is incomplete. 

C3 

M 

Set equal to the value of bit 1 of the quotient. 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the control 
register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

ST(0) was iinfinity. 

X 

X 

X 

ST(1) was ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 
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FPREM1 Floating-Point Partial Remainder 

Computes the IEEE Standard 754 remainder obtained by dividing the value in ST(0) by that in ST(1), 
and stores the result in ST(0). Unlike FPREM, it rounds the integer quotient to the nearest even integer 
and returns the remainder corresponding to the back multiply of the rounded quotient. 

If the exponent difference between ST(0) and ST(1) is less than 64, the instruction computes all integer 
as well as additional fractional bits of the quotient to do the rounding. The remainder returned is a 
complete remainder and is less than or equal to one half of the magnitude of the divisor. If the exponent 
difference is equal to or greater than 64, it computes only the subset of integer quotient bits numbering 
between 32 and 63, returns the partial remainder, and sets the C2 condition code bit to 1. 

Rounding control has no effect. FPREM 1 results are exact. 


Mnemonic 

Opcode 

Description 

FPREM1 

Action 

D9 F5 

Compute the IEEE standard 754 remainder of the division of 
ST(0) by ST(1) and store the result in ST(0). 


ExpDiff = Exponent(ST(0)) - Exponent(ST(1)) 

IF (ExpDiff < 0) 

{ 


SW.C2 = 0 

{SW.CO, SW.C3, SW.Cl} = 0 

} 

ELSIF (ExpDiff < 64) 

{ 

Quotient = Integer obtained by rounding (ST(0)/ST(1)) 
to nearest even integer 
ST(0) = ST(0) - (ST(1) * Quotient) 

SW.C2 = 0 

{SW.CO, SW.C3, SW.Cl} = Quotient mod 8 

} 

ELSE 

{ 

N = 32 + (ExpDiff mod 32) 

Quotient = Truncate ((ST(0)/ST(1))/2^(ExpDiff-N)) 

ST(0) = ST(0) - (ST(1) * Quotient * 2^(ExpDiff-N)) 

SW.C2 = 1 

{SW.CO, SW.C3, SW.Cl} = 0 

} 

Related Instructions 

FPREM, FABS, FRNDINT, FXTRACT, FCHS 
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rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

M 

Set equal to the value of bit 2 of the quotient. 

Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

M 

Set equal to the value of the bit 0 of the quotient, if there was no fault. 

C2 

0 

FPREM1 generated the partial remainder. 

1 

The source operands differed by more than a factor of 2 64 , so the result 
is incomplete. 

C3 

M 

Set equal to the value of bit 1 of the quotient. 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

ST(0) was iinfinity. 

X 

X 

X 

ST(1) was ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 
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FPTAN Floating-Point Partial Tangent 

Computes the tangent of the radian value in ST(0), stores the result in ST(0), and pushes a value of 1.0 
onto the x87 register stack. 

The source value must be between -2 63 and +2 63 radians. If the source value lies outside the specified 
range, the instruction sets the C2 bit of the x87 status word to 1 and does not change the value in ST(0). 


Mnemonic Opcode 

FPTAN D9 F2 

Related Instructions 

FCOS, FPATAN, FSIN, FSINCOS 

rFLAGS Affected 

None 


Description 

Replace ST(0) with the tangent of ST(0), then push 1.0 onto 
the x87 register stack. 


x87 Condition Code 


x87 Condition Code 


Value 


Description 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the control 
register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

A source operand was iinfinity 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

X 

X 

X 

An x87 stack overflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FRNDINT Floating-Point Round to Integer 

Rounds the value in ST(0) to an integer, depending on the setting of the rounding control (RC) field of 
the x87 control word, and stores the result in ST(0). 

If the initial value in ST(0) is °°, the instruction does not change ST(0). If the value in ST(0) is not an 
integer, it sets the precision exception (PE) bit of the x87 status word to 1. 

Mnemonic Opcode Description 

FRNDINT D9 FC Round the contents of ST(0) to an integer. 

Related Instructions 

FABS, FPREM, FXTRACT, FCHS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Precision exception 
(PE) 

X 

X 

X 

The source operand was not an integral value. 
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FRSTOR Floating-Point Restore x87 and MMX™ State 

Restores the complete x87 state from memory starting at the specified address, as stored by a previous 
call to F(N)SAVE. 

The FRSTOR instruction takes a memory operand that specifies the starting address of either a 94-byte 
or 108-byte area in memory. The 94-byte operand is required for a 16-bit operand-size; the 108-byte 
memory area is required for both 32-bit and 64-bit operand sizes. The layout of the saved x87 state 
within the specified memory area depends on whether the processor is operating in protected or real 
mode. See “Media and x87 Processor State” in Volume 2 for details on how this instruction stores the 
x87 environment in memory. (Because FSAVE does not save the full 64-bit data and instruction 
pointers, 64-bit applications should use FXSAVE/FXRSTOR, rather than FSAVE/FRSTOR.) 

Because the MMX registers are mapped onto the low 64 bits of the x87 floating-point registers, this 
operation also restores the MMX state. 

If FRSTOR results in set exception flags in the loaded x87 status word register, and these exceptions 
are unmasked in the x87 control word register, a floating-point exception occurs when the next 
floating-point instruction is executed (except for the no-wait floating-point instructions). 

To avoid generating exceptions when loading a new environment, use the FCLEX or FNCLEX 
instruction to clear the exception flags in the x87 status word before storing that environment. 

For details about the memory image restored by FRSTOR, see “Media and x87 Processor State” in 
Volume 2. 


Mnemonic 


Opcode Description 


FRSTOR 

mem94/108env 


DD/4 


Load the x87 state from mem94/108env. 


Related Instructions 

FSAVE, FNSAVE, FXSAVE, FXRSTOR 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

M 

Loaded from memory. 

Cl 

M 

Loaded from memory. 

C2 

M 

Loaded from memory. 

C3 

M 

Loaded from memory. 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FSAVE Floating-Point Save x87 and MMX™ State 

FNSAVE 

Stores the complete x87 state to memory starting at the specified address and reinitializes the x87 state. 

The FSAVE instruction takes a memory operand that specifies the starting address of either a 94-byte 
or 108-byte area in memory. The 94-byte operand is required for a 16-bit operand-size; the 108-byte 
memory area is required for both 32-bit and 64-bit operand sizes. The layout of the saved x87 state 
within the specified memory area depends on whether the processor is operating in protected or real 
mode. See “Media and x87 Processor State” in Volume 2 for details on how this instruction stores the 
x87 environment in memory. (Because FSAVE does not save the full 64-bit data and instruction 
pointers, 64-bit applications should use FXSAVE/FXRSTOR, rather than FSAVE/FRSTOR.) 

Because the MMX registers are mapped onto the low 64 bits of the x87 floating-point registers, this 
operation also saves the MMX state. 

The FNSAVE instruction does not wait for pending unmasked x87 floating-point exceptions to be 
processed. 

Assemblers usually provide an FSAVE macro that expands into the instruction sequence 

WAIT ; Opcode 9B 

FNSAVE destination ; Opcode DD /6 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler, if 
necessary. The FNSAVE instruction then stores the x87 state to the specified destination. 


Mnemonic 

Opcode 

Description 

FSAVE mem94/108env 

9B DD 16 

Copy the x87 state to 
pending floating-point 
state. 

FNSAVE 

mem94/108env 

DD 16 

Copy the x87 state to 
pending floating-point 
state. 


mem94/108env after checking for 
exceptions, then reinitialize the x87 


mem94/108env without checking for 
exceptions, then reinitialize the x87 


Related Instructions 

FRSTOR, FXSAVE, FXRSTOR 


rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

0 


Cl 

0 


C2 

0 


C3 

0 



Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or was 
non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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FSCALE Floating-Point Scale 

Multiplies the floating-point value in ST(0) by 2 to the power of the integer portion of the floating¬ 
point value in ST( 1). 

This instruction provides an efficient method of multiplying (or dividing) by integral powers of 2 
because, typically, it simply adds the integer value to the exponent of the value in ST(0), leaving the 
significand unaffected. However, if the value in ST(0) is a denormal value, the mantissa is also 
modified and the result may end up being a normalized number. Likewise, if overflow or underflow 
results from a scale operation, the mantissa of the resulting value will be different from that of the 
source. 

The FSCALE instruction performs the reverse operation to that of the FXTRACT instruction. 

Mnemonic Opcode Description 

FSCALE D9FD Replace ST(0) with ST(0) * 2 rndint ( ST f)) 

Related Instructions 

FSQRT, FPREM, FPREM1, FRNDINT, FXTRACT, FABS, FCHS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 

Undefined. 

Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 

Undefined. 

C3 

u 

Undefined 

Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FSIN Floating-Point Sine 

Computes the sine of the radian value in ST(0) and stores the result in ST(0). 

/ro /ro 

The source value must be in the range -2 to +2 radians. If the value lies outside this range, the 
instruction sets the C2 bit in the x87 status word to 1 and does not change the value in ST(0). 

Mnemonic Opcode Description 

FSIN D9 FE Replace ST(0) with the sine of ST(0). 

Related Instructions 

FCOS, FPATAN, FPTAN, FSINCOS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

0 

Source operand was in range. 

1 

Source operand was out of range. 

C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

A source operand was ±infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FSINCOS Floating-Point Sine and Cosine 

Computes the sine and cosine of the value in ST(0), stores the sine in ST(0), and pushes the cosine onto 

. /TO 

the x87 register stack. The source value must be in the range -2 to +2 radians. 

If the source operand is outside this range, the instruction sets the C2 bit in the x87 status word to 1 and 
does not change the value in ST(0). 

Mnemonic Opcode 

FSINCOS D9 FB 

Related Instructions 

FCOS, FPATAN, FPTAN, FSIN 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

0 

Result in ST(1) was rounded down, if a precision exception was 
detected. 

1 

Result in ST(1) was rounded up, if a precision exception was detected. 

C2 

0 

Source operand was in range. 

1 

Source operand was out of range. 

C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Replace ST(0) with the sine of ST(0), then push the cosine 
of ST(0) onto the x87 register stack. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the control 
register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

A source operand was ±infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

X 

X 

X 

An x87 stack overflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FSQRT Floating-Point Square Root 

Computes the square root of the value in ST(0) and stores the result in ST(0). Taking the square root of 
+infinity returns +infinity. 

Mnemonic Opcode Description 

FSQRT D9 FA Replace ST(0) with the square root of ST(0). 

Related Instructions 

FSCALE, FPREM, FPREM1, FRNDINT, FXTRACT, FABS, FCHS 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

A source operand was a negative value (not including - 
zero). 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FST Floating-Point Store Stack Top 

FSTP 

Copies the value in ST(0) to the specified floating-point register or memory location. 

The FSTP instruction pops the x87 stack after copying the value. The instruction FSTP ST(0) is the 
same as popping the stack with no data transfer. 

If the specified destination is a single-precision or double-precision memory location, the instruction 
converts the value to the appropriate precision format. It does this by rounding the significand of the 
source value as specified by the rounding mode determined by the RC field of the x87 control word 
and then converting to the format of destination. It also converts the exponent to the width and bias of 
the destination fonnat. 

If the value is too large for the destination format, the instruction sets the overflow exception (OE) bit 
of the x87 status word. Then, if the overflow exception is unmasked (OM bit cleared to 0 in the x87 
control word), the instruction does not perform the store. 

If the value is a denonnal value, the instruction sets the underflow exception (UE) bit in the x87 status 
word. 

If the value is ±0, ±°°, or a NaN, the instruction truncates the least significant bits of the significand and 
exponent to fit the destination location. 


Mnemonic 

Opcode 

Description 

FST ST(/) 

DD D0+/ 

Copy the contents of 

FST mem32real 

D9/2 

Copy the contents of 

FST mem64real 

DD 12 

Copy the contents of 

FSTP ST(/') 

DD D8+/ 

Copy the contents of 

FSTP mem32real 

D9/3 

Copy the contents of 

FSTP mem64real 

DD 13 

Copy the contents of 

FSTP mem80real 

DB/7 

Copy the contents of 


Related Instructions 

FFREE, FLD, FILD, FIST, FISTP, FBLD, FBSTP 

rFLAGS Affected 

None 


ST(0) to ST(/). 

ST(0) to mem32real. 

ST(0) to mem64real. 

ST(0) to ST(/) and pop the x87 register stack. 

ST(0) to mem32real and pop the x87 register stack 
ST(0) to mem64real and pop the x87 register stack 
ST(0) to mem80real and pop the x87 register stack 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

X 

X 

X 

An x87 stack overflow occurred. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the 
destination format. 
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FSTCW Floating-Point Store Control Word 

(FNSTCW) 

Stores the x87 control word in the specified 2-byte memory location. The FNSTCW instruction does 
not check for possible floating-point exceptions before copying the image of the x87 status register. 

Assemblers usually provide an FSTCW macro that expands into the instruction sequence: 

WAIT ; Opcode 9B 

FNSTCW destination ; Opcode D9 /7 

The WAIT (9Bh) instruction checks for pending x87 exception and calls an exception handler, if 
necessary. The FNSTCW instruction then stores the state of the x87 control register to the desired 
destination. 


Mnemonic Opcode 

FSTCW mem2env 9B D9 /7 

FNSTCW mem2env D9 17 

Related Instructions 

FSTSW, FNSTSW, FSTENV, FNSTENV 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Perform a WAIT (9B) to check for pending floating-point 
exceptions, then copy the x87 control word to mem2env. 

Copy the x87 control word to mem2env without checking for 
floating-point exceptions. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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FSTENV Floating-Point Store Environment 

(FNSTENV) 

Stores the current x87 environment to memory starting at the specified address, and then masks all 
floating-point exceptions. The x87 environment consists of the x87 control, status, and tag word 
registers, the last non-control x87 instruction pointer, the last x87 data pointer, and the opcode of the 
last completed non-control x87 instruction. 

The FSTENV instruction takes a memory operand that specifies the start of either a 14-byte or 28-byte 
area in memory. The 14-byte operand is required for a 16-bit operand-size; the 28-byte memory area is 
required for both 32-bit and 64-bit operand sizes. The layout of the saved x87 environment within the 
specified memory area depends on whether the processor is operating in protected or real mode. See 
“Media and x87 Processor State” in Volume 2 for details on how this instruction stores the x87 
enviromnent in memory. (Because FLDENV/FSTENV do not save the full 64-bit data and instruction 
pointers, 64-bit applications should use FXSAVE/FXRSTOR, rather than FLDENV/FSTENV.) 

The FNSTENV instruction does not check for possible floating-point exceptions before storing the 
enviromnent. 

Assemblers usually provide an FSTENV macro that expands into the instruction sequence 

WAIT ; Opcode 9B 

FNSTENV destination ; Opcode D9 /6 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler if 
necessary. The FNSTENV instruction then stores the state of the x87 environment to the specified 
destination. 

Exception handlers often use these instructions because they provide access to the x87 instruction and 
data pointers. An exception handler typically saves the environment on the stack. The instructions 
mask all floating-point exceptions after saving the environment to prevent those exceptions from 
interrupting the exception handler. 

Opcode 

9B D9 16 

D9/6 

Related Instructions 

FLDENV, FSTSW, FNSTSW, FSTCW, FNSTCW 


Mnemonic 

FSTENV 

mem14/28env 

FNSTENV 

mem14/28env 


Description 

Perform a WAIT (9B) to check for pending floating-point 
exceptions, then copy the x87 environment to mem14/28env 
and mask the floating-point exceptions. 

Copy the x87 environment to mem 14/28env without 
checking for pending floating-point exceptions, and mask 
the exceptions. 
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rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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FSTSW Floating-Point Store Status Word 

(FNSTSW) 

Stores the current state of the x87 status word register in either the AX register or a specified two-byte 
memory location. The image of the status word placed in the AX register always reflects the result 
after the execution of the previous x87 instruction. 

The AX fonn of the instruction is useful for performing conditional branching operations based on the 
values of x87 condition flags. 

The FNSTSW instruction does not check for possible floating-point exceptions before storing the x87 
status word. 

Assemblers usually provide an FSTSW macro that expands into the instruction sequence: 

WAIT ; Opcode 9B 

FNSTSW destination ; Opcode DD /7 or DF EO 

The WAIT (9Bh) instruction checks for pending x87 exceptions and calls an exception handler if 
necessary. The FNSTSW instruction then stores the state of the x87 status register to the desired 
destination. 


Mnemonic 

Opcode 

FSTSW AX 

9B DF EO 

FSTSW mem2env 

9B DD 17 

FNSTSW AX 

DF EO 

FNSTSW mem2env 

DD/7 


Description 

Perform a WAIT (9B) to check for pending floating-point 
exceptions, then copy the x87 status word to the AX register. 

Perform a WAIT (9B) to check for pending floating-point 
exceptions, then copy the x87 status word to mem12byte. 

Copy the x87 status word to the AX register without 
checking for pending floating-point exceptions. 

Copy the x87 status word to mem12byte without checking 
for pending floating-point exceptions. 


Related Instructions 

FSTCW, FNSTCW, FSTENV, FNSTENV 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

The destination operand was in a nonwritable segment. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 
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FSUB Floating-Point Subtract 

FSUBP 

FISUB 

Subtracts the value in a floating-point register or memory location from the value in another register 
and stores the result in that register. 

If no operands are specified, the instruction subtracts the value in ST(0) from that in ST(1) and stores 
the result in ST(1). 

If one operand is specified, it subtracts a floating-point or integer value in memory from the contents 
of ST(0) and stores the result in ST(0). 

If two operands are specified, it subtracts the value in ST(0) from the value in another floating-point 
register or vice versa. 

The FSUBP instruction pops the x87 register stack after performing the subtraction. 

The no-operand version of the instruction always pops the register stack. In some assemblers, the 
mnemonic for this instruction is FSUB rather than FSUBP. 

The FISUB instruction converts a signed integer value to double-extended-precision format before 
performing the subtraction. 


Mnemonic 

Opcode 

Description 

FSUB ST(0),ST(/) 

D8 E0+/ 

Replace ST(0) with ST(0) - ST(/). 

FSUB ST(/),ST(0) 

DC E8+/ 

Replace ST(/') with ST(/) - ST(0). 

FSUB mem32real 

D8/4 

Replace ST(0) with ST(0) - mem32real. 

FSUB mem64real 

DC/4 

Replace ST(0) with ST(0) - mem64real. 

FSUBP 

DE E9 

Replace ST(1) with ST(1) - ST(0) and pop the x87 register 
stack. 

FSUBP ST(/),ST(0) 

DE E8+/ 

Replace ST(/') with ST(/) - ST(0), and pop the x87 register 
stack. 

FISUB mem16int 

DEM 

Replace ST(0) with ST(0) - mem16int. 

FISUB mem32int 

DAM 

Replace ST(0) with ST(0) - mem32int. 


Related Instructions 

FSUBRP, FISUBR, FSUBR 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

-•-infinity was subtracted from +infinity. 

X 

X 

X 

-infinity was subtracted from -infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 


x87 Floating-Point 
Instruction Reference 


FSUBx 


311 








AMD J 

AMD64 Technology 


26569—Rev. 3.15—May 2018 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FSUBR Floating-Point Subtract Reverse 

FSUBRP 

FISUBR 

Subtracts the value in a floating-point register from the value in another register or a memory location, 
and stores the result in the first specified register. Values in memory can be in single-precision or 
double-precision floating-point, word integer, or short integer format. 

If one operand is specified, the instruction subtracts the value in ST(0) from the value in memory and 
stores the result in ST(0). 

If two operands are specified, it subtracts the value in ST(0) from the value in another floating-point 
register or vice versa. 

The FSUBRP instruction pops the x87 register stack after perfonning the subtraction. 

The no-operand version of the instruction always pops the register stack. In some assemblers, the 
mnemonic for this instruction is FSUBR rather than FSUBRP. 

The FISUBR instruction converts a signed integer operand to double-extended-precision format 
before performing the subtraction. 

The FSUBR instructions perform the reverse operations of the FSUB instructions. 


Mnemonic 

Opcode 

Description 

FSUBR ST(0),ST(/) 

D8 E8+/ 

Replace ST(0) with ST(/) - ST(0). 

FSUBR ST(/),ST(0) 

DC E0+/ 

Replace ST(/) with ST(0) - ST(/). 

FSUBR mem32real 

D8/5 

Replace ST(0) with mem32real - ST(0). 

FSUBR mem64real 

DC/5 

Replace ST(0) with mem64real - ST(0). 

FSUBRP 

DE El 

Replace ST(1) with ST(0) - ST(1) and pop x87 stack 

FSUBRP ST(/),ST(0) 

DE E0+/ 

Replace ST(/) with ST(0) - ST(/) and pop x87 stack. 

FISUBR mem16int 

DE/5 

Replace ST(0) with mem16int - ST(0). 

FISUBR mem32int 

DA 15 

Replace ST(0) with mem32int - ST(0). 


Related Instructions 

FSUB, FSUBP, FISUB 

rFLAGS Affected 

None 
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x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit or 
was non-canonical. 

General protection, 
#GP 

X 

X 

X 

A memory address exceeded a data segment limit or was 
non-canonical. 



X 

A null data segment was used to reference memory. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the instruction. 

Alignment check, 

#AC 


X 

X 

An unaligned memory reference was performed while 
alignment checking was enabled. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

-•-infinity was subtracted from +infinity. 

X 

X 

X 

-infinity was subtracted from -infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 
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Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FTST Floating-Point Test with Zero 

Compares the value in ST(0) with 0.0, and sets the condition code flags in the x87 status word as 
shown in the x87 Condition Code table below. The instruction ignores the sign distinction between 
-0.0 and +0.0. 

Mnemonic Opcode Description 

FTST D9 E4 Compare ST(0) to 0.0. 

Related Instructions 

FCOM, FCOMP, FCOMPP, FCOMI, FCOMIP, FICOM, FICOMP, FUCOMI, FUCOMIP, FUCOM, 
FUCOMP, FUCOMPP, FXAM 

rFLAGS Affected 

None 

x87 Condition Code 


C3 

C2 

Cl 

CO 

Compare Result 

0 

0 

0 

0 

ST(0) > 0.0 

0 

0 

0 

1 

ST(0) < 0.0 

1 

0 

0 

0 

ST(0) = 0.0 

1 

1 

0 

1 

ST(0) was unordered 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CR0) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was a SNaN value, a QNaN value, or an 
unsupported format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FUCOM Floating-Point Unordered Compare 

FUCOMP 

FUCOMPP 

Compares the value in ST(0) to the value in another x87 register, and sets the condition codes in the 
x87 status word as shown in the x87 Condition Code table below. 

If no source operand is specified, the instruction compares the value in ST(0) to that in ST( 1). 

After making the comparison, the FUCOMP instruction pops the x87 stack register and the 
FUCOMPP instruction pops the x87 stack register twice. 

The instruction carries out the same comparison operation as the FCOM instructions, but sets the 
invalid-operation exception (IE) bit in the x87 status word to 1 when either or both operands are an 
SNaN or are in an unsupported format. If either or both operands is a QNaN, it sets the condition code 
flags to unordered, but does not set the IE bit. The FCOM instructions, on the other hand, raise an IE 
exception when either or both of the operands are a NaN value or are in an unsupported format. 

Support for the FCOM(P(P)) instruction can be determined by executing either CPUID function 
OOOO OOOlh or CPUID function 8000 0001. Support is indicated when both the EDX[FPU] (bit 0) 
and EDX[CMOV] (bit 15) feature flags are set. 


Mnemonic 

Opcode 

Description 

FUCOM 

DD El 

Compare ST(0) to ST(1) and set condition code flags to 
reflect the results of the comparison. 

FUCOM ST(/') 

DD E0+/' 

Compare ST(0) to ST(/) and set condition code flags to 
reflect the results of the comparison. 

FUCOMP 

DD E9 

Compare ST(0) to ST(1), set condition code flags to reflect 
the results of the comparison, and pop the x87 register 
stack. 

FUCOMP ST(/) 

DD E8+/ 

Compare ST(0) to ST(/), set condition code flags to reflect 
the results of the comparison, and pop the x87 register 
stack. 

FUCOMPP 

DA E9 

Compare ST(0) to ST(1), set condition code flags to reflect 
the results of the comparison, and pop the x87 register stack 


twice. 


Related Instructions 

FCOM, FCOMPP, FCOMI, FCOMIP, FICOM, FICOMP, FTST, FUCOMI, FUCOMIP, FXAM 

rFLAGS Affected 

None 
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x87 Condition Code 


C3 

C2 

Cl 

CO 

Compare Result 

0 

0 

0 

0 

ST(0) > source 

0 

0 

0 

1 

ST(0) < source 

1 

0 

0 

0 

ST(0) = source 

1 

1 

0 

1 

Operands were unordered 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FUCOMI Floating-Point Unordered Compare and Set 

FUCOMIP eFLAGS 

Compares the contents of ST(0) with the contents of another floating-point register, and sets the zero 
flag (ZF), parity flag (PF), and carry flag (CF) as shown in the rFLAGS Affected table below. 

Unlike FCOMI and FCOMIP, the FUCOMI and FUCOMIP instructions do not set the invalid- 
operation exception (IE) bit in the x87 status word for QNaNs. 

After completing the comparison, FUCOMIP pops the x87 register stack. 

Support for the FCOMI(P) instruction can be determined by executing either CPUID function 
OOOOOOOlh or CPUID function 8000 0001. Support is indicated when both the EDX[FPU] (bit 0) 
and EDX[CMOV] (bit 15) feature flags are set. 


Mnemonic Opcode 

FUCOMI ST(0),ST(/) DB E8+/ 

FUCOMIP ST(0),ST(/) DF E8+/ 


Description 

Compare ST(0) to ST(/') and set eFLAGS to reflect the result 
of the comparison. 

Compare ST(0) to ST(/'), set eFLAGS to reflect the result of 
the comparison, and pop the x87 register stack. 


Related Instructions 

FCOM, FCOMPP, FCOMI, FCOMIP, FICOM, FICOMP, FTST, FUCOM, FUCOMP, FUCOMPP, 
FXAM 

rFLAGS Affected 


ZF 

PF 

CF 

Compare Result 

0 

0 

0 

ST(0) > source 

0 

0 

1 

ST(0) < source 

1 

0 

0 

ST(0) = source 

1 

1 

1 

Operands were unordered 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 



Cl 

0 
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x87 Condition Code 

Value 

Description 

C2 



C3 



Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The conditional move instructions are not supported, as 
indicated by EDX[FPU] and EDX[CMOV] returned by 

CPU ID function 0000_0001h or 8000_0001h. 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 
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FWAIT Wait for Unmasked x87 Floating-Point 

(WAIT) Exceptions 

Forces the processor to test for pending unmasked floating-point exceptions before proceeding. 

If there is a pending floating-point exception and CRO.NE = 1, a numeric exception (#MF) is 
generated. If there is a pending floating-point exception and CRO.NE = 0, FWAIT asserts the FERR 
output signal, then waits for an external interrupt. 

This instruction is useful for insuring that unmasked floating-point exceptions are handled before 
altering the results of a floating point instruction. 

FWAIT and WAIT are synonyms for the same opcode. 


Mnemonic Opcode Description 

FWAIT 9B Check for any pending floating-point exceptions. 

Related Instructions 

None 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

U 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The monitor coprocessor bit (MP) and the task switch bit 
(TS) of the control register (CRO) were both set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FXAM Floating-Point Examine 

Examines the value in ST(0) and sets the CO, C2, and C3 condition code flags in the x87 status word as 
shown in the x87 Condition Code table below to indicate whether the value is a NaN, infinity, zero, 
empty, denormal, nonnal finite, or unsupported value. The instruction also sets the Cl flag to indicate 
the sign of the value in ST(0) (0 = positive, 1 = negative). 

Mnemonic Opcode Description 

FXAM D9 E5 Characterize the number in the ST(0) register. 

Related Instructions 

FCOM, FCOMP, FCOMPP, FCOMI, FCOMIP, FICOM, FICOMP, FTST, FUCOM, FUCOMI, 
FUCOMIP, FUCOMP, FUCOMPP 

rFLAGS Affected 

None 

x87 Condition Code 


C3 

C2 

Cl 

CO 

Meaning 

0 

0 

0 

0 

+unsupported 

format 

0 

0 

0 

1 

+NaN 

0 

0 

1 

0 

-unsupported 

format 

0 

0 

1 

1 

-NaN 

0 

1 

0 

0 

+normal 

0 

1 

0 

1 

+infinity 

0 

1 

1 

0 

-normal 

0 

1 

1 

1 

-infinity 

1 

0 

0 

0 

+0 

1 

0 

0 

1 

+empty 

1 

0 

1 

0 

-0 

1 

0 

1 

1 

-empty 

1 

1 

0 

0 

+denormal 

1 

1 

1 

0 

-denormal 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 
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FXCH Floating-Point Exchange 

Exchanges the value in ST(0) with the value in any other x87 register. If no operand is specified, the 
instruction exchanges the values in ST(0) and ST(1). 

Use this instruction to move a value from an x87 register to ST(0) for subsequent processing by a 
floating-point instruction that can only operate on ST(0). 


Mnemonic Opcode 

FXCH D9 C9 

FXCH ST(/) D9 C8+/' 

Related Instructions 

FLD, FST, FSTP 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 


C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 


Description 

Exchange the contents of ST(0) and ST(1). 
Exchange the contents of ST(0) and ST(/). 
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FXRSTOR Restore XMM, MMX™, and x87 State 

Restores the XMM, MMX, and x87 state. The data loaded from memory is the state information 
previously saved using the FXSAVE instruction. Restoring data with FXRSTOR that had been 
previously saved with an FSAVE (rather than FXSAVE) instruction results in an incorrect restoration. 

If FXRSTOR results in set exception flags in the loaded x87 status word register, and these exceptions 
are unmasked in the x87 control word register, a floating-point exception occurs when the next 
floating-point instruction is executed (except for the no-wait floating-point instructions). 

If the restored MXCSR register contains a set bit in an exception status flag, and the corresponding 
exception mask bit is cleared (indicating an unmasked exception), loading the MXCSR register from 
memory does not cause a SIMD floating-point exception (#XF). 

FXRSTOR does not restore the x87 error pointers (last instruction pointer, last data pointer, and last 
opcode), except when FXRSTOR sets FSW.ES=1 after recomputing it from the error mask bits in 
FCW and error status bits in FSW. 

The architecture supports two 512-bit memory formats for FXRSTOR, a 64-bit format that loads 
XMM0-XMM15, and a 32-bit legacy format that loads only XMM0-XMM7. If FXRSTOR is 
executed in 64-bit mode, the 64-bit format is used, otherwise the 32-bit format is used. When the 64- 
bit format is used, if the operand-size is 64-bit, FXRSTOR loads the x87 pointer registers as offset64, 
otherwise it loads them as sel:offset32. For details about the memory fonnat used by FXRSTOR, see 
"Saving Media and x87 Processor State" in Volume 2. 

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXRSTOR does not restore the 
XMM registers (XMM0-XMM15) when executed in 64-bit mode at CPL 0. MXCSR is restored 
whether fast-FXSAVE/FXRSTOR is enabled or not. 

Support for the fast-FXSAVE/FXRSTOR feature is indicated by CPUID 
Fn8000_0001_EDX[FFXSR] = 1. 

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, the saved 
image of XMM0-XMM15 and MXCSR is not loaded into the processor. A general-protection 
exception occurs if the FXRSTOR instruction attempts to load non-zero values into reserved MXCSR 
bits. Software can use MXCSR_MASK to determine which bits of MXCSR are reserved. For details 
on the MXCSR_MASK, see “SSE, MMX, and x87 Programming” in Volume 2. 

Support for this instruction is implementation-specific. CPUID Fn8000_0001_EDX[FXSR] = 1 or 
CPUID Fn0000_0001_EDX[FXSR] = 1 indicates support for the FXSAVE and FXRSTOR 
instructions. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


FXRSTOR mem512env OF AE /I 


Restores XMM, MMX™, and x87 state from 512-byte 
memory location. 
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Related Instructions 

FWAIT, FXSAVE 

rFLAGS Affected 

None 


MXCSR Flags Affected 


MM 

FZ 

RC 

PM 

UM 

OM 

ZM 

DM 

IM 

DAZ 

PE 

UE 

OE 

ZE 

DE 

IE 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

M 

17 

15 

14 

13 

12 

11 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Note: A flag that can be set to one or zero is M (modified). Unaffected flags are blank. Shaded fields are reserved. 


Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The FXSAVE/FXRSTOR instructions are not 
supported, as indicated by EDX[FXSR] = 0 returned 
by CPUID Fn0000_0001 or Fn8000_0001. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit, 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded the data segment limit 
or was non-canonical. 



X 

A null data segment was used to reference memory. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary. 

X 

X 

X 

Ones were written to the reserved bits in MXCSR. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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FXSAVE Save XMM, MMX™, and x87 State 

Saves the XMM, MMX, and x87 state. A memory location that is not aligned on a 16-byte boundary 
causes a general-protection exception. 

Unlike FSAVE and FNSAVE, FXSAVE does not alter the x87 tag bits. The contents of the saved 
MMX/x87 data registers are retained, thus indicating that the registers may be valid (or whatever other 
value the x87 tag bits indicated prior to the save). To invalidate the contents of the MMX/x87 data 
registers after FXSAVE, software must execute an FINIT instruction. Also, FXSAVE (like FNSAVE) 
does not check for pending unmasked x87 floating-point exceptions. An FWAIT instruction can be 
used for this purpose. 

FXSAVE does not save the x87 pointer registers (last instruction pointer, last data pointer, and last 
opcode), except in the relatively rare cases in which the exception-summary (ES) bit in the x87 status 
word is set to 1, indicating that an unmasked x87 exception has occurred. 

The architecture supports two 512-bit memory formats for FXSAVE, a 64-bit format that saves 
XMM0-XMM15, and a 32-bit legacy format that saves only XMM0-XMM7. If FXSAVE is executed 
in 64-bit mode, the 64-bit format is used, otherwise the 32-bit fonnat is used. When the 64-bit format is 
used, if the operand-size is 64-bit, FXSAVE saves the x87 pointer registers as offset64, otherwise it 
saves them as sel:offset32. For more details about the memory format used by FXSAVE, see “Saving 
Media and x87 Execution Unit State” in Volume 2. 

If the fast-FXSAVE/FXRSTOR (FFXSR) feature is enabled in EFER, FXSAVE does not save the 
XMM registers (XMM0-XMM15) when executed in 64-bit mode at CPL 0. MXCSR is saved whether 
fast-FXSAVE/FXRSTOR is enabled or not. Support for the fast-FXSAVE/FXRSTOR feature is 
indicated by CPUID Fn8000_0001_EDX[FFXSR] = 1. 

If the operating-system FXSAVE/FXRSTOR support bit (OSFXSR) of CR4 is cleared to 0, FXSAVE 
does not save the image of XMM0-XMM15 or MXCSR. For details about the CR4.OSFXSR bit, see 
“FXSAVE and FXRSTOR Instructions” in Volume 2. 

Support for this instruction is implementation-specific. CPUID Fn8000_0001_EDX[FXSR] = 1 or 
CPUID Fn0000_0001_EDX[FXSR] = 1 indicates support for the FXSAVE and FXRSTOR 
instructions. See “CPUID” in Volume 3 for more information about the CPUID instruction. 


Mnemonic 


Opcode Description 


FXSAVE mem512env 


OF AE 10 


Saves XMM, MMX™, and x87 state to 512-byte 
memory location. 


Related Instructions 

FINIT, FNSAVE, FRSTOR, FSAVE, FXRSTOR, LDMXCSR, STMXCSR 

rFLAGS Affected 

None 
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MXCSR Flags Affected 

None 

Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Invalid opcode, #UD 

X 

X 

X 

The FXSAVE/FXRSTOR instructions are not 
supported, as indicated by EDX[FXSR] = 0 returned 
by CPUID Fn0000_0001 or Fn8000_0001. 

X 

X 

X 

The emulate bit (EM) of CRO was set to 1. 

Device not available, 

#NM 

X 

X 

X 

The task-switch bit (TS) of CRO was set to 1. 

Stack, #SS 

X 

X 

X 

A memory address exceeded the stack segment limit, 
or was non-canonical. 

General protection, #GP 

X 

X 

X 

A memory address exceeded the data segment limit 
or was non-canonical. 



X 

A null data segment was used to reference memory. 



X 

The destination operand was in a non-writable 
segment. 

X 

X 

X 

The memory operand was not aligned on a 16-byte 
boundary. 

Page fault, #PF 


X 

X 

A page fault resulted from the execution of the 
instruction. 
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FXTRACT Floating-Point Extract Exponent and Significand 

Extracts the exponent and significand portions of the floating-point value in ST(0), stores the exponent 
in ST(0), and then pushes the significand onto the x87 register stack. After this operation, the new 
ST(0) contains a real number with the sign and value of the original significand and an exponent of 
3FFFh (biased value for true exponent of zero), and ST(1) contains a real number that is the value of 
the original value’s true (unbiased) exponent. 

The FXTRACT instruction is useful for converting a double-extended-precision number to its decimal 
representation. 

If the zero-divide-exception mask (ZM) bit of the x87 control word is set to 1 and the source value is 
±0, then the instruction stores ±zero in ST(0) and an exponent value of-°° in register ST( 1). 


Mnemonic Opcode Description 

Extract the exponent and significand of ST(0), store the 
FXTRACT D9 F4 exponent in ST(0), and push the significand onto the x87 

register stack. 

Related Instructions 

FABS, FPREM, FRNDINT, FCHS 

rFLAGS Affected 

None 


x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

1 

x87 stack overflow, if an x87 register stack fault was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not 
available, #NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) is set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

X 

X 

X 

An x87 stack overflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Zero-divide 
exception (ZE) 

X 

X 

X 

The source operand was ±zero. 
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FYL2X Floating-Point y * Log 2 (x) 

Computes (ST(1) * log 2 (ST(0))), stores the result in ST(1), and pops the x87 register stack. The value 
in ST(0) must be greater than zero. 

If the zero-divide-exception mask (ZM) bit in the x87 control word is set to 1 and ST(0) contains 
±zero, the instruction returns °° with the opposite sign of the value in register ST( 1). 


Mnemonic Opcode 

FYL2X D9 FI 

Related Instructions 

FYL2XP1, F2XM1 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

No precision exception occurred. 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Replace ST(1) with ST(1) * log 2 (ST(0)), then pop the x87 
register stack. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN value or an unsupported 
format. 

X 

X 

X 

The source operand in ST(0) was a negative finite value 
(not -zero). 

X 

X 

X 

The source operand in ST(0) was +1 and the source 
operand in ST(1) was ±infinity. 

X 

X 

X 

The source operand in ST(0) was -infinity. 

X 

X 

X 

The source operand in ST(0) was ±zero or ±infinity and the 
source operand in ST(1) was ±zero. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Zero-divide 
exception (ZE) 

X 

X 

X 

The source operand in ST(0) was ±zero and the source 
operand in ST(1) was a finite value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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FYL2XP1 Floating-Point y * Log 2 (x+1) 

Computes (ST(1) * log 2 (ST(0) + 1.0)), stores the result in ST(1), and pops the x87 register stack. The 
value in ST(0) must be in the range sqrt( 1/2)—1 to sqrt(2)-l. 


Mnemonic Opcode 

FYL2XP1 D9 F9 

Related Instructions 

FYL2X, F2XM1 

rFLAGS Affected 

None 

x87 Condition Code 


x87 Condition Code 

Value 

Description 

CO 

U 


Cl 

0 

x87 stack underflow, if an x87 register stack fault was detected. 

0 

Result was rounded down, if a precision exception was detected. 

1 

Result was rounded up, if a precision exception was detected. 

C2 

u 


C3 

u 


Note: A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. Undefined flags are U. 


Description 

Replace ST(1) with ST(1) * log 2 (ST(0) + 1.0), then pop the 
x87 register stack. 
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Exceptions 


Exception 

Real 

Virtual 

8086 

Protected 

Cause of Exception 

Device not available, 
#NM 

X 

X 

X 

The emulate bit (EM) or the task switch bit (TS) of the 
control register (CRO) was set to 1. 

x87 floating-point 
exception pending, 
#MF 

X 

X 

X 

An unmasked x87 floating-point exception was pending. 

x87 Floating-Point Exception Generated, #MF 

Invalid-operation 
exception (IE) 

X 

X 

X 

A source operand was an SNaN or unsupported format. 

X 

X 

X 

The source operand in ST(0) was ±0 and the source 
operand in ST(1) was ±infinity. 

Invalid-operation 
exception (IE) with 
stack fault (SF) 

X 

X 

X 

An x87 stack underflow occurred. 

Denormalized- 
operand exception 
(DE) 

X 

X 

X 

A source operand was a denormal value. 

Overflow exception 
(OE) 

X 

X 

X 

A rounded result was too large to fit into the format of the 
destination operand. 

Underflow exception 
(UE) 

X 

X 

X 

A rounded result was too small to fit into the format of the 
destination operand. 

Precision exception 
(PE) 

X 

X 

X 

A result could not be represented exactly in the destination 
format. 
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Appendix A Recommended Substitutions for 

3DNow!™ Instructions 


Table A-l lists the deprecated 3DNow!™ instructions and the recommended substitutions. 


Table A-1. Substitutions for 3DNowi™ Instructions 


64-Bit 3DNow!™ 
Instruction 

128-Bit SSE 
Instruction 

64-Bit MMX™ 
Instruction 

Notes 

FEMMS 

N/A 

EMMS (MMX) 


PAVGUSB 

PAVGB 

PAVGB 

SSE and MMX™ instructions round according to the 
current rounding mode; 3DNow!™ instructions always 
round up. 

PF2ID 

CVTTPS2DQ 



PF2IW 



CVTTPS2DQ may be used if 16-bit result is not 
necessary. 

PFACC 

HADDPS 



PFADD 

ADDPS 



PFCMPEQ 

CMPPS 



PFCMPGE 

CMPPS 



PFCMPGT 

CMPPS 



PFMAX 

MAXPS 


MAXPS may return -0.0. 

PFMIN 

MINPS 


MINPS may return -0.0. 

PFMUL 

MULPS 



PFNACC 

HSUBPS 



PFPNACC 

ADDSUBPS 


ADDSUBPS expects arguments in different positions 
from PFPNACC. 

PFRCP 



RCPSS may be used in conjunction with the Newton- 
Raphson algorithm. 

PFRCPIT1 



See PFRCP. 

PFRCPIT2 



See PFRCP. 

PFRSQIT1 



See PFRSQRT. 

PFRSQRT 



RSQRTSS may be used in conjunction with the 
Newton-Raphson algorithm. 

PFSUB 

SUBPS 



PFSUBR 



SUBPS may be used. 

PI2FD 

CVTDQ2PS 


SSE instructions round according to the current 
rounding mode; 3DNow! instructions always truncate. 

PI2FW 




PMULHRW 



PMULHW may be used if rounding is not necessary. 

PSWAPD 

PSHUFD 
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Index 


Numerics 


16-bit mode. xvii 

32-bit mode. xvii 

64-bit mode. xvii 

A 

addressing 

RIP-relative. xxii 

AES. xviii 

AS1D. xviii 

B 

biased exponent. xviii 

c 

commit. xviii 

compatibility mode. xviii 

condition codes 

x87. 217 

CVTPD2PI. 3 

CVTPI2PD. 6 

CVTPI2PS. 8 

CVTPS2PI. 10 

CVTTPD2PI. 12 

CVTTPS2P1. 15 

D 

direct referencing. xviii 

displacements. xviii 

double quadword. xix 

doubleword. xix 

E 

eAX-eSP register. xxv 

effective address size. xix 

effective operand size. xix 

eFLAGS register. xxv 

elP register. xxv 

element. xix 

EMMS. 17 

endian order. xxvii 

exceptions. xix 

exponent. xviii 

extended SSE. xix 

AES. xviii 

AVX. xviii 

FMA. xix 

FMA4. xix 


XOP. xxiv 

F 

F2XM1. 218 

FABS. 220 

FADD. 222 

FADDP. 222 

FBLD. 225 

FBSTP. 227 

FCHS. 229 

FCLEX. 230 

FCMOVcc. 232 

FCOM. 234 

FCOMI. 237 

FCOMIP. 237 

FCOMP. 234 

FCOMPP. 234 

FCOS. 239 

FDECSTP. 241 

FD1V. 243 

FDIVP. 243 

FDIVR. 246 

FDIVRP. 246 

FEMMS. 18 

FFREE. 249 

FIADD. 222 

FICOM. 250 

FICOMP. 250 

F1DIV. 243 

F1D1VR. 246 

F1LD. 252 

F1MUL. 276 

FINCSTP. 254 

FIN IT. 256 

FIST. 258 

FISTP. 258 

FISTTP. 261 

FISUB. 310 

FISUBR. 313 

FLD. 263 

FLD1. 265 

FLDCW. 266 

FLDENV. 268 

FLDL2E. 270 

FLDL2T. 271 

FLDLG2. 272 

FLDLN2. 273 

FLDPI. 274 
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FLDZ. 275 

flush. xix 

FMUL. 276 

FMULP. 276 

FNCLEX. 230 

FNIN1T. 256 

FNOP. 279 

FNSAVE. 22,292 

FNSTCW. 304 

FNSTENV. 306 

FNSTSW. 308 

FPATAN. 280 

FPREM. 282 

FPREM1. 284 

FPTAN. 286 

FRNDINT. 288 

FRSTOR. 20,290 

FSAVE. 22,292 

FSCALE. 294 

FSIN. 296 

FSINCOS. 298 

FSQRT. 300 

FST. 302 

FSTCW. 304 

FSTENV. 306 

FSTP. 302 

FSTSW. 308 

FSUB. 310 

FSUBP. 310 

FSUBR. 313 

FSUBRP. 313 

FTST. 316 

FUCOM. 317 

FUCOMI. 319 

FUCOMIP. 319 

FUCOMP. 317 

FUCOMPP. 317 

FWAIT. 321 

FXAM. 322 

FXCH. 324 

FXRSTOR. 24, 325 

FXSAVE. 26,327 

FXTRACT. 329 

FYL2X. 331 

FYL2XP1. 333 

I 

IGN. xx 

indirect. xx 

instructions 

3DNow!. 2 

3DNow! Extensions. 2 


3DNow!™. 1 

64-bit media. 1 

FXSAVE/FXRSTOR. 2 

MMX. 2 

MMX Extensions. 2 

SSE1 . 2 

SSE2. 2 

x87. 217 

L 

legacy mode. xx 

legacy SSE. xx 

legacy x86. xx 

long mode. xx 

LSB. xxi 

lsb. xxi 

M 

mask. xxi 

MASKMOVQ. 28 

MBZ. xxi 

media instructions 

128-bit. xvii 

256-bit. xvii 

64-bit. xvii 

memory 

physical. xxii 

modes 

compatibility. xviii 

legacy. xx 

long. xx 

protected. xxii 

real. xxii 

virtual-8086. xxiv 

MOVD. 31 

MOVDQ2Q. 34 

MOVNTQ. 36 

MOVQ. 38 

MOVQ2DQ. 40 

MSB. xxi 

msb. xxi 

MSR. xxvi 

o 

octword. xxi 

offset. xxi 

overflow. xxi 

P 

packed. xxi 

PACKSSDW. 42 

PACKSSWB. 44 

PACKUSWB. 46 
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PADDB. 48 

PADDD. 50 

PADDQ. 52 

PADDSB. 54 

PADDSW. 56 

PADDUSB. 58 

PADDUSW. 60 

PADDW. 62 

PAE. xxii 

PAND. 64 

PANDN. 66 

PAVGB. 68 

PAVGUSB. 70 

PAVGW. 72 

PCMPEQB. 74 

PCMPEQD. 76 

PCMPEQW. 78 

PCMPGTB. 80 

PCMPGTD. 82 

PCMPGTW. 84 

PEXTRW. 86 

PF2ID. 88 

PF2IW. 90 

PFACC. 92 

PFADD. 94 

PFCMPEQ. 96 

PFCMPGE. 98 

PFCMPGT. 101 

PFMAX. 103 

PFMIN. 105 

PFMUL. 107 

PFNACC. 109 

PFPNACC. 112 

PFRCP. 115 

PFRCPIT1. 118 

PFRCP1T2. 121 

PFRSQIT1. 124 

PFRSQRT. 127 

PFSUB. 130 

PFSUBR. 132 

physical memory. xxii 

PI2FD. 134 

PI2FW. 136 

PINSRW. 138 

PMADDWD. 140 

PMAXSW. 142 

PMAXUB. 144 

PMINSW. 146 

PMINUB. 148 

PMOVMSKB. 150 

PMULHRW. 152 


PMULHUW. 154 

PMULHW. 156 

PMULLW. 158 

PMULUDQ. 160 

POR. 162 

probe. xxii 

processor modes 

16-bit. xvii 

32-bit. xvii 

64-bit. xvii 

protected mode. xxii 

PSADBW. 164 

PSHUFW. 166 

PSLLD. 169 

PSLLQ. 171 

PSLLW. 173 

PSRAD. 175 

PSRAW. 177 

PSRLD. 179 

PSRLQ. 181 

PSRLW. 183 

PSUBB. 185 

PSUBD. 187 

PSUBQ. 189 

PSUBSB. 191 

PSUBSW. 193 

PSUBUSB. 195 

PSUBUSW. 197 

PSUBW. 199 

PSWAPD. 201 

PUNPCKHBW. 203 

PUNPCKHDQ. 205 

PUNPCKHWD. 207 

PUNPCKLBW. 209 

PUNPCKLDQ. 211 

PUNPCKLWD. 213 

PXOR. 215 

Q 

quadword. xxii 

R 

r8-rl5. xxvi 

rAX-rSP. xxvi 

RAZ. xxii 

real address mode. See real mode 

real mode. xxii 

registers 

eAX-eSP. xxv 

eFLAGS. xxv 

elP. xxv 

r8-rl5. xxvi 
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rAX-rSP. xxvi 

rFLAGS. xxvii 

rIP. xxvii 

relative. xxii 

reserved. xxii 

revision history. xiii 

REX. xxii 

rFLAGS register. xxvii 

rIP register. xxvii 

RIP-relative addressing. xxii 

s 

SBZ. xxiii 

scalar. xxiii 

set. xxiii 

SIB. xxiii 

SIMD. xxiii 

SSE Instructions. xxiii 

extended. xix 

legacy. xx 

SSE instructions 

AES. xviii 

AVX. xviii 

FMA. xix 

FMA4. xix 

SSE1. xxiii 

SSE2. xxiii 

SSE3. xxiii 

SSE4.1 . xxiii 

SSE4.2. xxiii 

SSE4A. xxiii 

SSSE3. xxiii 

XOP. xxiv 

sticky bits. xxiv 

Streaming SIMD Extensions (SSE). xxiii 

T 

TSS. xxiv 

u 

underflow. xxiv 

V 

vector. xxiv 

virtual-8086 mode. xxiv 

w 

WAIT. 321 

X 

XOP 

Instructions. xxiv 

Prefix. xxiv 
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